LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Remove New Line or Carriage return from Text File (http://www.linuxquestions.org/questions/programming-9/remove-new-line-or-carriage-return-from-text-file-587935/)

DIMonS 09-28-2007 04:37 AM

Remove New Line or Carriage return from Text File
 
I have been running a very successful script removing URLs (grep and cut) from emails from my ebay favorites. In the last week or so the width of the emails that I have been receiving has narrowed to the extent that the URL now falls over 2 lines:

Item title:=09Dragon 1/6 Scale MODERN British SA80 Rifle MDRW0020
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130156919=
207&ssPageName=3DADME:B:SS:UK:1

Buy It Now price: =A36.99
Postage: +=A31.69
End time: 04-Oct-07 21:58 BST

The current script just pulls off the 1st line and the subsequent wget errors out.

I have tried awk, sed and tr in an attempt to remove the = sign and then the NL or CR. The = signs is easy but I am not having any luck joining the two lines up. I'm guessing that it is just an NL or CR. In vi or gedit it is not a problem to remove the NL or CR.

Any pointers please.

DIMonS

ghostdog74 09-28-2007 05:08 AM

only tested on your sample
Code:

awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next }
    !/Buy It Now price:/{ line = line$0 }
    /Buy It Now price:/{exit}
END{print line}' "file"


angrybanana 09-28-2007 05:09 AM

I know you said you tried awk, sed and tr.. but you didn't mention what you tried..

See if any of this works:

sed 'N;s/\n//'
tr '\n' ' '
awk '{printf $0}'

DIMonS 09-28-2007 07:53 AM

Looping a script?
 
Thanks guys for the help.

I've tweaked a little but can't seem to get it check a larger doument with multiple URL's. Am I on the right track with while ; do ?

cat mail.txt | awk '{sub(/[= \t]+$/, "");print}' > tempmail.txt

while true ; do

awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next }
!/Buy It Now price:/{ line = line$0 }
/Buy It Now price:/{exit}
END{print line}' "tempmail.txt" > tempmail.txt

fi

done

cut -c 3- tempmail2.txt > newmail.txt

wget --restrict-file-names=windows -nd -E -H -k -p --random-wait --tries=2 -i tempmail.txt

theNbomr 09-28-2007 10:59 AM

I'll kick in my perl contribution:
Code:

#! /bin/perl -w
#
#  LQDIMonS.pl
#
#  usage:
#  LQDIMonS.pl email.txt
#
use strict;

    my @email = <>;
    my $i = 0;
    while( $i <  @email-1 ){
   
        if( $email[$i] =~ m/=$/ ){
            chomp $email[$i];
            $email[$i] =~ s/=$//;
            $email[$i] .= $email[$i+1];
            splice @email, $i+1, 1;
        }
        $i++;
    }
   
    foreach my $record ( @email ){
        # print $record;
        if( $record =~ m/Item URL:\s+(.+)$/ ){
            print "$1\n";
        }
    }

Not too sure what treatment you are trying to give to the ending '='. Is it supposed to be removed, and then the following line concatenated?

--- rod.

ghostdog74 09-28-2007 11:50 AM

Quote:

Originally Posted by DIMonS (Post 2906410)
Thanks guys for the help.

I've tweaked a little but can't seem to get it check a larger doument with multiple URL's. Am I on the right track with while ; do ?

cat mail.txt | awk '{sub(/[= \t]+$/, "");print}' > tempmail.txt

while true ; do

awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next }
!/Buy It Now price:/{ line = line$0 }
/Buy It Now price:/{exit}
END{print line}' "tempmail.txt" > tempmail.txt

fi

done

cut -c 3- tempmail2.txt > newmail.txt

wget --restrict-file-names=windows -nd -E -H -k -p --random-wait --tries=2 -i tempmail.txt

show the sample of the larger document. do note that the awk snippet was created based on your earlier sample and nothing else.

angrybanana 09-28-2007 08:36 PM

Until you give us an example of what you're doing exactly, here's a modification ghostdog74's code. I *think* this will do the trick for you, I'm not too good with AWK though, so YMMV.
Code:

awk '/Item URL:/ { sub(/Item URL: /,"");line=$0;next }
    !/Buy It Now price:/{ line = line$0 }
    /Buy It Now price:/{print line}' "file"


DIMonS 10-01-2007 05:29 AM

OK. Here is a snippet of an email:

Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280156759=
851&ssPageName=3DADME:B:SS:UK:1
Current bid: =A31.49(0 Bids)
Postage: +=A31.40
End time: 05-Oct-07 15:28 BST

Item title:=09TO INHERIT THE SKIES From Spitfire to Tornado (RAF)
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174171=
555&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A37.50
Postage: +=A31.92
End time: 05-Oct-07 19:29 BST

Item title:=09ROYAL AIR FORCE GERMANY-RAF BR=DCGGEN TORNADO GR1-N=B0IX SQ
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D180163747=
538&ssPageName=3DADME:B:SS:UK:1
Current bid: =A34.99(0 Bids)
Postage: +=A31.00
End time: 03-Oct-07 21:10 BST

Item title:=09GENUINE RAF PANAVIA TORNADO STICKERS x 9 VERY RARE!
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157148=
966&ssPageName=3DADME:B:SS:UK:1
Current bid: =A39.99(0 Bids)
Postage: +=A31.00
End time: 06-Oct-07 17:28 BST

Item title:=09FILM SHOT FROM RAF TORNADO F3 OVER FALKLAND ISLANDS
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157204=
247&ssPageName=3DADME:B:SS:UK:1
Current bid: =A33.99(0 Bids)
Postage: +=A31.45
End time: 06-Oct-07 20:17 BST

Item title:=09RAF GSM W/C Medal Air Operations Iraq
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D140162264=
295&ssPageName=3DADME:B:SS:UK:1
Current bid: =A320.00(1 Bid)
Postage: +=A33.00
End time: 29-Sep-07 19:45 BST

Item title:=09BRITISH ARMY,SAS,RAF,RN,RM, IRAQ MEDAL WITH CLASP
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166070=
318&ssPageName=3DADME:B:SS:UK:1
Current bid: =A325.00(0 Bids)
Postage: +=A32.50
End time: 01-Oct-07 20:19 BST

Item title:=09BRITISH ARMY,SAS,RAF,RN,RM, IRAQ MEDAL
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166071=
357&ssPageName=3DADME:B:SS:UK:1
Current bid: =A325.00(0 Bids)
Postage: +=A32.50
End time: 01-Oct-07 20:22 BST

Item title:=09RAF 1419 Flt,patch, Basra, Iraq,
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157088=
253&ssPageName=3DADME:B:SS:UK:1
Current bid: US $8.99(1 Bid)
Postage: +US $1.40
End time: 03-Oct-07 13:22 BST

Item title:=09British SA80 Bayonet with Infantry Scabbard
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D170153405=
862&ssPageName=3DADME:B:SS:UK:1
Current bid: =A359.99(0 Bids)
Postage: +=A33.50
End time: 29-Sep-07 20:46 BST

Item title:=09SA80 Bayonet & Scabbard
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174159=
405&ssPageName=3DADME:B:SS:UK:1
Current bid: =A337.99(0 Bids)
Buy It Now price: =A337.99
Postage: +=A35.00
End time: 02-Oct-07 18:59 BST

Item title:=09SA80 & LSW Skill at Arms on CD Royal Marines Paras TA
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164667=
262&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A33.99
Postage: +=A31.01
End time: 02-Oct-07 23:23 BST

Item title:=09HOPPES BORESNAKE PULL THROUGH SA80 5.56mm .22" BNIB
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164878=
678&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A314.99
Postage: +=A32.00
End time: 03-Oct-07 16:34 BST

Item title:=09Molle SA80 Magazine Pouch DPM, SAS, SBS, SFSG, PARA, RM
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169612=
935&ssPageName=3DADME:B:SS:UK:1
Current bid: =A37.50(0 Bids)
Buy It Now price: =A315.99
Postage: +=A33.99
End time: 05-Oct-07 17:22 BST

Item title:=09Molle SA80 Magazine Pouch DPM, SAS, SBS, SFSG, PARA, RM
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169613=
191&ssPageName=3DADME:B:SS:UK:1
Current bid: =A37.50(0 Bids)
Buy It Now price: =A315.99
Postage: +=A33.99
End time: 05-Oct-07 17:23 BST

Item title:=09NEW SAS SMOCK + FREE SA80 SLING / A5 NYREX / MODEL KIT
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130157163=
136&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A365.00
Postage: +=A30.00
End time: 05-Oct-07 18:31 BST

The rest of the email is full of info that I do not require. I copied /var/spool/mail/emailaddress to a emailfile.

I want to cut out all "Item URLs" from emailfile (example above) currently over 2 lines, and then wget the ebay pages using the urls previously collected. My old script got all URL's from the file in a single swoop but emails now arrive in a different width width

ghostdog74 10-01-2007 05:50 AM

sometimes you have buy it now, sometimes you don't
Code:

awk '/Item URL:/ { sub(/Item URL: /,"");line=$0;next }
    !/Buy It Now price:/{ line = line$0 }
    /Buy It Now price:/ || /Current bid/{ sub(/Current bid:.*/,"",line);print line}' "file"

output:
Code:

./testnew.sh
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280156759=851&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174171=555&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D180163747=538&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157148=966&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157204=247&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D140162264=295&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166070=318&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166071=357&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157088=253&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D170153405=862&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174159=405&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174159=405&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164667=262&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164878=678&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169612=935&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169612=935&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169613=191&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169613=191&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130157163=136&ssPageName=3DADME:B:SS:UK:1


radoulov 10-01-2007 07:24 AM

With GNU awk:

Code:

awk 'NR>1&&$0=RS$1$2' RS="http" filename

DIMonS 10-01-2007 08:06 AM

Thankyou very much everyone for all your help. Still plowing through the man pages trying to find out what it all means.

Just off to do some awk stuff to remove the 3D from and the = sign in the middle of the id string .. bad URL otherwise.
....ViewItem&item=3D140163577=658&ssPageName=3DADME:B:SS:UK:1

DIMonS

angrybanana 10-01-2007 10:17 AM

Quote:

Originally Posted by radoulov (Post 2909204)
With GNU awk:

Code:

awk 'NR>1&&$0=RS$1$2' RS="http" filename

Just curious what operator doesn't work with non GNU awk?
and yea, that code works perfect on my linux, but glitches in freebsd.

radoulov 10-01-2007 10:22 AM

Quote:

Originally Posted by angrybanana (Post 2909340)
Just curious what operator doesn't work with non GNU awk?
and yea, that code works perfect on my linux, but glitches in freebsd.

With other awks (not sure about tawk, cannot check it right now)the RS can be only one character.

angrybanana 10-01-2007 11:31 AM

Quote:

Originally Posted by radoulov (Post 2909342)
With other awks (not sure about tawk, cannot check it right now)the RS can be only one character.

thank you.


All times are GMT -5. The time now is 11:21 PM.