Remove New Line or Carriage return from Text File
I have been running a very successful script removing URLs (grep and cut) from emails from my ebay favorites. In the last week or so the width of the emails that I have been receiving has narrowed to the extent that the URL now falls over 2 lines:
Item title:=09Dragon 1/6 Scale MODERN British SA80 Rifle MDRW0020 Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130156919= 207&ssPageName=3DADME:B:SS:UK:1 Buy It Now price: =A36.99 Postage: +=A31.69 End time: 04-Oct-07 21:58 BST The current script just pulls off the 1st line and the subsequent wget errors out. I have tried awk, sed and tr in an attempt to remove the = sign and then the NL or CR. The = signs is easy but I am not having any luck joining the two lines up. I'm guessing that it is just an NL or CR. In vi or gedit it is not a problem to remove the NL or CR. Any pointers please. DIMonS |
only tested on your sample
Code:
awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next } |
I know you said you tried awk, sed and tr.. but you didn't mention what you tried..
See if any of this works: sed 'N;s/\n//' tr '\n' ' ' awk '{printf $0}' |
Looping a script?
Thanks guys for the help.
I've tweaked a little but can't seem to get it check a larger doument with multiple URL's. Am I on the right track with while ; do ? cat mail.txt | awk '{sub(/[= \t]+$/, "");print}' > tempmail.txt while true ; do awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next } !/Buy It Now price:/{ line = line$0 } /Buy It Now price:/{exit} END{print line}' "tempmail.txt" > tempmail.txt fi done cut -c 3- tempmail2.txt > newmail.txt wget --restrict-file-names=windows -nd -E -H -k -p --random-wait --tries=2 -i tempmail.txt |
I'll kick in my perl contribution:
Code:
#! /bin/perl -w --- rod. |
Quote:
|
Until you give us an example of what you're doing exactly, here's a modification ghostdog74's code. I *think* this will do the trick for you, I'm not too good with AWK though, so YMMV.
Code:
awk '/Item URL:/ { sub(/Item URL: /,"");line=$0;next } |
OK. Here is a snippet of an email:
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280156759= 851&ssPageName=3DADME:B:SS:UK:1 Current bid: =A31.49(0 Bids) Postage: +=A31.40 End time: 05-Oct-07 15:28 BST Item title:=09TO INHERIT THE SKIES From Spitfire to Tornado (RAF) Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174171= 555&ssPageName=3DADME:B:SS:UK:1 Buy It Now price: =A37.50 Postage: +=A31.92 End time: 05-Oct-07 19:29 BST Item title:=09ROYAL AIR FORCE GERMANY-RAF BR=DCGGEN TORNADO GR1-N=B0IX SQ Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D180163747= 538&ssPageName=3DADME:B:SS:UK:1 Current bid: =A34.99(0 Bids) Postage: +=A31.00 End time: 03-Oct-07 21:10 BST Item title:=09GENUINE RAF PANAVIA TORNADO STICKERS x 9 VERY RARE! Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157148= 966&ssPageName=3DADME:B:SS:UK:1 Current bid: =A39.99(0 Bids) Postage: +=A31.00 End time: 06-Oct-07 17:28 BST Item title:=09FILM SHOT FROM RAF TORNADO F3 OVER FALKLAND ISLANDS Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157204= 247&ssPageName=3DADME:B:SS:UK:1 Current bid: =A33.99(0 Bids) Postage: +=A31.45 End time: 06-Oct-07 20:17 BST Item title:=09RAF GSM W/C Medal Air Operations Iraq Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D140162264= 295&ssPageName=3DADME:B:SS:UK:1 Current bid: =A320.00(1 Bid) Postage: +=A33.00 End time: 29-Sep-07 19:45 BST Item title:=09BRITISH ARMY,SAS,RAF,RN,RM, IRAQ MEDAL WITH CLASP Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166070= 318&ssPageName=3DADME:B:SS:UK:1 Current bid: =A325.00(0 Bids) Postage: +=A32.50 End time: 01-Oct-07 20:19 BST Item title:=09BRITISH ARMY,SAS,RAF,RN,RM, IRAQ MEDAL Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166071= 357&ssPageName=3DADME:B:SS:UK:1 Current bid: =A325.00(0 Bids) Postage: +=A32.50 End time: 01-Oct-07 20:22 BST Item title:=09RAF 1419 Flt,patch, Basra, Iraq, Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157088= 253&ssPageName=3DADME:B:SS:UK:1 Current bid: US $8.99(1 Bid) Postage: +US $1.40 End time: 03-Oct-07 13:22 BST Item title:=09British SA80 Bayonet with Infantry Scabbard Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D170153405= 862&ssPageName=3DADME:B:SS:UK:1 Current bid: =A359.99(0 Bids) Postage: +=A33.50 End time: 29-Sep-07 20:46 BST Item title:=09SA80 Bayonet & Scabbard Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174159= 405&ssPageName=3DADME:B:SS:UK:1 Current bid: =A337.99(0 Bids) Buy It Now price: =A337.99 Postage: +=A35.00 End time: 02-Oct-07 18:59 BST Item title:=09SA80 & LSW Skill at Arms on CD Royal Marines Paras TA Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164667= 262&ssPageName=3DADME:B:SS:UK:1 Buy It Now price: =A33.99 Postage: +=A31.01 End time: 02-Oct-07 23:23 BST Item title:=09HOPPES BORESNAKE PULL THROUGH SA80 5.56mm .22" BNIB Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164878= 678&ssPageName=3DADME:B:SS:UK:1 Buy It Now price: =A314.99 Postage: +=A32.00 End time: 03-Oct-07 16:34 BST Item title:=09Molle SA80 Magazine Pouch DPM, SAS, SBS, SFSG, PARA, RM Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169612= 935&ssPageName=3DADME:B:SS:UK:1 Current bid: =A37.50(0 Bids) Buy It Now price: =A315.99 Postage: +=A33.99 End time: 05-Oct-07 17:22 BST Item title:=09Molle SA80 Magazine Pouch DPM, SAS, SBS, SFSG, PARA, RM Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169613= 191&ssPageName=3DADME:B:SS:UK:1 Current bid: =A37.50(0 Bids) Buy It Now price: =A315.99 Postage: +=A33.99 End time: 05-Oct-07 17:23 BST Item title:=09NEW SAS SMOCK + FREE SA80 SLING / A5 NYREX / MODEL KIT Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130157163= 136&ssPageName=3DADME:B:SS:UK:1 Buy It Now price: =A365.00 Postage: +=A30.00 End time: 05-Oct-07 18:31 BST The rest of the email is full of info that I do not require. I copied /var/spool/mail/emailaddress to a emailfile. I want to cut out all "Item URLs" from emailfile (example above) currently over 2 lines, and then wget the ebay pages using the urls previously collected. My old script got all URL's from the file in a single swoop but emails now arrive in a different width width |
sometimes you have buy it now, sometimes you don't
Code:
awk '/Item URL:/ { sub(/Item URL: /,"");line=$0;next } Code:
./testnew.sh |
With GNU awk:
Code:
awk 'NR>1&&$0=RS$1$2' RS="http" filename |
Thankyou very much everyone for all your help. Still plowing through the man pages trying to find out what it all means.
Just off to do some awk stuff to remove the 3D from and the = sign in the middle of the id string .. bad URL otherwise. ....ViewItem&item=3D140163577=658&ssPageName=3DADME:B:SS:UK:1 DIMonS |
Quote:
and yea, that code works perfect on my linux, but glitches in freebsd. |
Quote:
|
Quote:
|
All times are GMT -5. The time now is 03:47 AM. |