LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 09-28-2007, 03:37 AM   #1
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Rep: Reputation: 0
Post Remove New Line or Carriage return from Text File


I have been running a very successful script removing URLs (grep and cut) from emails from my ebay favorites. In the last week or so the width of the emails that I have been receiving has narrowed to the extent that the URL now falls over 2 lines:

Item title:=09Dragon 1/6 Scale MODERN British SA80 Rifle MDRW0020
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130156919=
207&ssPageName=3DADME:B:SS:UK:1

Buy It Now price: =A36.99
Postage: +=A31.69
End time: 04-Oct-07 21:58 BST

The current script just pulls off the 1st line and the subsequent wget errors out.

I have tried awk, sed and tr in an attempt to remove the = sign and then the NL or CR. The = signs is easy but I am not having any luck joining the two lines up. I'm guessing that it is just an NL or CR. In vi or gedit it is not a problem to remove the NL or CR.

Any pointers please.

DIMonS
 
Old 09-28-2007, 04:08 AM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
only tested on your sample
Code:
awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next }
     !/Buy It Now price:/{ line = line$0 }
    /Buy It Now price:/{exit}
END{print line}' "file"
 
Old 09-28-2007, 04:09 AM   #3
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
I know you said you tried awk, sed and tr.. but you didn't mention what you tried..

See if any of this works:

sed 'N;s/\n//'
tr '\n' ' '
awk '{printf $0}'
 
Old 09-28-2007, 06:53 AM   #4
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Original Poster
Rep: Reputation: 0
Looping a script?

Thanks guys for the help.

I've tweaked a little but can't seem to get it check a larger doument with multiple URL's. Am I on the right track with while ; do ?

cat mail.txt | awk '{sub(/[= \t]+$/, "");print}' > tempmail.txt

while true ; do

awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next }
!/Buy It Now price:/{ line = line$0 }
/Buy It Now price:/{exit}
END{print line}' "tempmail.txt" > tempmail.txt

fi

done

cut -c 3- tempmail2.txt > newmail.txt

wget --restrict-file-names=windows -nd -E -H -k -p --random-wait --tries=2 -i tempmail.txt
 
Old 09-28-2007, 09:59 AM   #5
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
I'll kick in my perl contribution:
Code:
#! /bin/perl -w
#
#   LQDIMonS.pl
#
#   usage:
#   LQDIMonS.pl email.txt
#
use strict;

    my @email = <>;
    my $i = 0;
    while( $i <  @email-1 ){
    
        if( $email[$i] =~ m/=$/ ){
            chomp $email[$i];
            $email[$i] =~ s/=$//;
            $email[$i] .= $email[$i+1];
            splice @email, $i+1, 1;
        }
        $i++;
    }
    
    foreach my $record ( @email ){
        # print $record;
        if( $record =~ m/Item URL:\s+(.+)$/ ){
            print "$1\n";
        }
    }
Not too sure what treatment you are trying to give to the ending '='. Is it supposed to be removed, and then the following line concatenated?

--- rod.
 
Old 09-28-2007, 10:50 AM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Quote:
Originally Posted by DIMonS View Post
Thanks guys for the help.

I've tweaked a little but can't seem to get it check a larger doument with multiple URL's. Am I on the right track with while ; do ?

cat mail.txt | awk '{sub(/[= \t]+$/, "");print}' > tempmail.txt

while true ; do

awk '/Item URL:/ { sub(/Item URL:/,"");line=$0;next }
!/Buy It Now price:/{ line = line$0 }
/Buy It Now price:/{exit}
END{print line}' "tempmail.txt" > tempmail.txt

fi

done

cut -c 3- tempmail2.txt > newmail.txt

wget --restrict-file-names=windows -nd -E -H -k -p --random-wait --tries=2 -i tempmail.txt
show the sample of the larger document. do note that the awk snippet was created based on your earlier sample and nothing else.
 
Old 09-28-2007, 07:36 PM   #7
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Until you give us an example of what you're doing exactly, here's a modification ghostdog74's code. I *think* this will do the trick for you, I'm not too good with AWK though, so YMMV.
Code:
awk '/Item URL:/ { sub(/Item URL: /,"");line=$0;next }
     !/Buy It Now price:/{ line = line$0 }
    /Buy It Now price:/{print line}' "file"

Last edited by angrybanana; 09-28-2007 at 07:38 PM.
 
Old 10-01-2007, 04:29 AM   #8
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Original Poster
Rep: Reputation: 0
OK. Here is a snippet of an email:

Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280156759=
851&ssPageName=3DADME:B:SS:UK:1
Current bid: =A31.49(0 Bids)
Postage: +=A31.40
End time: 05-Oct-07 15:28 BST

Item title:=09TO INHERIT THE SKIES From Spitfire to Tornado (RAF)
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174171=
555&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A37.50
Postage: +=A31.92
End time: 05-Oct-07 19:29 BST

Item title:=09ROYAL AIR FORCE GERMANY-RAF BR=DCGGEN TORNADO GR1-N=B0IX SQ
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D180163747=
538&ssPageName=3DADME:B:SS:UK:1
Current bid: =A34.99(0 Bids)
Postage: +=A31.00
End time: 03-Oct-07 21:10 BST

Item title:=09GENUINE RAF PANAVIA TORNADO STICKERS x 9 VERY RARE!
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157148=
966&ssPageName=3DADME:B:SS:UK:1
Current bid: =A39.99(0 Bids)
Postage: +=A31.00
End time: 06-Oct-07 17:28 BST

Item title:=09FILM SHOT FROM RAF TORNADO F3 OVER FALKLAND ISLANDS
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157204=
247&ssPageName=3DADME:B:SS:UK:1
Current bid: =A33.99(0 Bids)
Postage: +=A31.45
End time: 06-Oct-07 20:17 BST

Item title:=09RAF GSM W/C Medal Air Operations Iraq
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D140162264=
295&ssPageName=3DADME:B:SS:UK:1
Current bid: =A320.00(1 Bid)
Postage: +=A33.00
End time: 29-Sep-07 19:45 BST

Item title:=09BRITISH ARMY,SAS,RAF,RN,RM, IRAQ MEDAL WITH CLASP
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166070=
318&ssPageName=3DADME:B:SS:UK:1
Current bid: =A325.00(0 Bids)
Postage: +=A32.50
End time: 01-Oct-07 20:19 BST

Item title:=09BRITISH ARMY,SAS,RAF,RN,RM, IRAQ MEDAL
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166071=
357&ssPageName=3DADME:B:SS:UK:1
Current bid: =A325.00(0 Bids)
Postage: +=A32.50
End time: 01-Oct-07 20:22 BST

Item title:=09RAF 1419 Flt,patch, Basra, Iraq,
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157088=
253&ssPageName=3DADME:B:SS:UK:1
Current bid: US $8.99(1 Bid)
Postage: +US $1.40
End time: 03-Oct-07 13:22 BST

Item title:=09British SA80 Bayonet with Infantry Scabbard
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D170153405=
862&ssPageName=3DADME:B:SS:UK:1
Current bid: =A359.99(0 Bids)
Postage: +=A33.50
End time: 29-Sep-07 20:46 BST

Item title:=09SA80 Bayonet & Scabbard
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174159=
405&ssPageName=3DADME:B:SS:UK:1
Current bid: =A337.99(0 Bids)
Buy It Now price: =A337.99
Postage: +=A35.00
End time: 02-Oct-07 18:59 BST

Item title:=09SA80 & LSW Skill at Arms on CD Royal Marines Paras TA
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164667=
262&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A33.99
Postage: +=A31.01
End time: 02-Oct-07 23:23 BST

Item title:=09HOPPES BORESNAKE PULL THROUGH SA80 5.56mm .22" BNIB
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164878=
678&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A314.99
Postage: +=A32.00
End time: 03-Oct-07 16:34 BST

Item title:=09Molle SA80 Magazine Pouch DPM, SAS, SBS, SFSG, PARA, RM
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169612=
935&ssPageName=3DADME:B:SS:UK:1
Current bid: =A37.50(0 Bids)
Buy It Now price: =A315.99
Postage: +=A33.99
End time: 05-Oct-07 17:22 BST

Item title:=09Molle SA80 Magazine Pouch DPM, SAS, SBS, SFSG, PARA, RM
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169613=
191&ssPageName=3DADME:B:SS:UK:1
Current bid: =A37.50(0 Bids)
Buy It Now price: =A315.99
Postage: +=A33.99
End time: 05-Oct-07 17:23 BST

Item title:=09NEW SAS SMOCK + FREE SA80 SLING / A5 NYREX / MODEL KIT
Item URL: http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130157163=
136&ssPageName=3DADME:B:SS:UK:1
Buy It Now price: =A365.00
Postage: +=A30.00
End time: 05-Oct-07 18:31 BST

The rest of the email is full of info that I do not require. I copied /var/spool/mail/emailaddress to a emailfile.

I want to cut out all "Item URLs" from emailfile (example above) currently over 2 lines, and then wget the ebay pages using the urls previously collected. My old script got all URL's from the file in a single swoop but emails now arrive in a different width width
 
Old 10-01-2007, 04:50 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
sometimes you have buy it now, sometimes you don't
Code:
awk '/Item URL:/ { sub(/Item URL: /,"");line=$0;next }
     !/Buy It Now price:/{ line = line$0 }
    /Buy It Now price:/ || /Current bid/{ sub(/Current bid:.*/,"",line);print line}' "file"
output:
Code:
./testnew.sh
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280156759=851&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174171=555&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D180163747=538&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157148=966&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157204=247&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D140162264=295&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166070=318&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D120166071=357&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D280157088=253&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D170153405=862&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174159=405&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D110174159=405&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164667=262&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D290164878=678&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169612=935&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169612=935&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169613=191&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D330169613=191&ssPageName=3DADME:B:SS:UK:1
http://cgi.ebay.co.uk/ws/eBayISAPI.d...m=3D130157163=136&ssPageName=3DADME:B:SS:UK:1
 
Old 10-01-2007, 06:24 AM   #10
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
With GNU awk:

Code:
awk 'NR>1&&$0=RS$1$2' RS="http" filename
 
Old 10-01-2007, 07:06 AM   #11
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Original Poster
Rep: Reputation: 0
Smile

Thankyou very much everyone for all your help. Still plowing through the man pages trying to find out what it all means.

Just off to do some awk stuff to remove the 3D from and the = sign in the middle of the id string .. bad URL otherwise.
....ViewItem&item=3D140163577=658&ssPageName=3DADME:B:SS:UK:1

DIMonS
 
Old 10-01-2007, 09:17 AM   #12
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Quote:
Originally Posted by radoulov View Post
With GNU awk:

Code:
awk 'NR>1&&$0=RS$1$2' RS="http" filename
Just curious what operator doesn't work with non GNU awk?
and yea, that code works perfect on my linux, but glitches in freebsd.
 
Old 10-01-2007, 09:22 AM   #13
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Quote:
Originally Posted by angrybanana View Post
Just curious what operator doesn't work with non GNU awk?
and yea, that code works perfect on my linux, but glitches in freebsd.
With other awks (not sure about tawk, cannot check it right now)the RS can be only one character.

Last edited by radoulov; 10-01-2007 at 09:24 AM.
 
Old 10-01-2007, 10:31 AM   #14
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Quote:
Originally Posted by radoulov View Post
With other awks (not sure about tawk, cannot check it right now)the RS can be only one character.
thank you.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
workaround for C++ printf carriage return idc12 Programming 2 09-11-2006 12:13 PM
carriage return in emac lisp balloon Programming 1 11-23-2004 08:29 AM
Inserting a carriage return in awk legtester Linux - General 1 08-17-2003 05:29 PM
How to remove line of text from file netkepala Linux - General 2 05-23-2003 11:49 AM
pppd...carriage return?!? icyfire Linux - Software 1 02-14-2002 07:07 AM


All times are GMT -5. The time now is 01:07 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration