ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I think what the OP is saying is that if there is a 'newline' followed immediately by a pipe symbol '|', then remove just the newline...
Right, which is a pain to do with sed (for me, at least.) The awk script I'd written earlier tried to do that - checks for a '|' at the start of line, and appends to the previous line.
PS: now that's odd, there was no blank lines in my input file and 1 extra at the start of my output, but when I copy/pasted it, it's different ... hmmmmmmmmmmmm
Sorry, was away last night and didn't see the new posts. I'll try the perl script in a little while once I get my head screwed back on straight this morning.
BTW, the way I have been editing the file (by hand) is the visually identify a line that starts with a pipe, put my cursor to the left of the pipe and hit the backspace key - which makes it part of the previous line (record).
That information just in case there was clarification needed.
BTW, the way I have been editing the file (by hand) is the visually identify a line that starts with a pipe, put my cursor to the left of the pipe and hit the backspace key - which makes it part of the previous line (record).
That information just in case there was clarification needed.
HTH, and thanks for the help.
Stevemcb
So, that means the line starts with a space followed by a pipe. Which is probably why my earlier awk script failed, as it was looking for lines starting with pipes.
Try this mod:
$ awk -F'\n?\\|' '$1=$1' OFS='|' RS= uglydb
2003123|A15690195|3|N|1994-03-15 00:00:00|OPS$LSANDERS|SOUTHERN|COMPANY|64A PERIMETER CENTER EAST||ATLANTA|GA|US|30346|||REPLACEMENT IS DOA|REPLACE DOA REPLACEMENT|1993-12-16 00:00:00|1993-12-21 00:00:00|1993-12-21 00:00:00|1994-01-11 00:00:00|1994-03-15 00:00:00|N||THIS DOES NOT APPEAR TO BE A DUPLICATE OF CLAIM A14659942. PRIOR CLAIM WAS SERVICED BY NW COMPUTER SUPPORT IN WA STATE. SERIAL #'S MUST HAVE BEEN TYPED IN INCORRECTLY.|A|||CDR-74|||||0||||||||||||00000194.0013.0005
2001235|A15078491|3|N|1994-06-28 00:00:00|OPS$LSANDERS|NPPD||PO BOX 499||COLUMBUS|NE|US|68601|||DOA|REPALCED MONITOR|||||1994-03-15 00:00:00|N|Pending more than 60 days with no resolution; claim rejected|YELLOW STICKY ATTACHED TO CLAIM INDICATED THAT MONITOR WAS RETURNED ON MRA NUMBER #41801 ON 02/22/94.....SOP MRA POINTS TO CLAIM NUMBER #A15078478|R|||JC-1532VMA-2|||||||||||||||||00000194.0014.0005
Edit: Perl way if doing the same thing. Both of these are loading the whole file into ram. So if the db is HUGE then it might not be a good idea.
Code:
perl -0pe 's#\n\|#\|#g' uglydb
Another Edit: Just realized I forgot to add a 'g' at the end of the regex, worked with the example cause it was only 2 records.
Last edited by angrybanana; 01-17-2008 at 07:34 PM.
angrybanana: That's why I designed mine not to load the whole file into memory.
h/w: as per my prev post, I think it's actually newline then pipe.
He's saying how he manually fixed it by backspacing to delete the newline.
I took a fresh copy of the file and used the "awk 'BEGIN{nxt="";}{curr=$0;getline nxt;if(index(nxt, " |")== 1){print curr nxt;}else{print $0;}}' < inputfile > outputfile" on it.
It went from 308,000 records down to 190,000 records, and a quick visual scan of the results makes me think that it did, indeed, process all the records that started with a pipe. Now I need to go back to the guy who created the file to begin with and verify how many records were in the database to make sure I didn't lose any.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.