LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-16-2008, 01:43 PM   #16
h/w
Senior Member
 
Registered: Mar 2003
Location: New York, NY
Distribution: Debian Testing
Posts: 1,286

Rep: Reputation: 46

Quote:
Originally Posted by stevemcb View Post
I didn't get a chance to try it last night, but when I ran it this morning, there are still issues (lines that start with a pipe).
Strange, when I had tried it, it seemed to work. At least it did on the sample you'd given above.

Let's just wait for other members to chime in now. This isn't a biggie really, but with my rusty skills it's taking me more time than I should.
 
Old 01-16-2008, 05:09 PM   #17
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,356

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
I think what the OP is saying is that if there is a 'newline' followed immediately by a pipe symbol '|', then remove just the newline...
 
Old 01-16-2008, 05:45 PM   #18
h/w
Senior Member
 
Registered: Mar 2003
Location: New York, NY
Distribution: Debian Testing
Posts: 1,286

Rep: Reputation: 46
Quote:
Originally Posted by chrism01 View Post
I think what the OP is saying is that if there is a 'newline' followed immediately by a pipe symbol '|', then remove just the newline...
Right, which is a pain to do with sed (for me, at least.) The awk script I'd written earlier tried to do that - checks for a '|' at the start of line, and appends to the previous line.
 
Old 01-17-2008, 12:03 AM   #19
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,356

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Ok, not pretty, but seems to work:

Code:
#!/usr/bin/perl -w

use locale;             # Ensure correct charset for eg 'uc()'
use strict;             # Enforce declarations

my (
    $out_rec, $new_rec, $in_rec, $pipe
   );

open(DATA, "<t.dat") or die "Can't open t.dat $!\n";
while( defined($in_rec = <DATA> ) )
{
    chomp($in_rec);

    if(substr($in_rec, 0, 1) eq '|' )
    {
        $new_rec = substr($in_rec, 1, length($in_rec) -1);
        $out_rec .= $new_rec;
        $pipe = 1;
    }
    else
    {
        $out_rec .= "\n".$in_rec;
        $pipe = 0;
    }

    if( !$pipe )
    {
        print "$out_rec";
        $out_rec = "";
    }
}
print "\n";
close(DATA) or die "Can't close t.dat $!\n";
data file:

Code:
aaasasdasdasdddsd
dddddddddddddddd
|ffffffffffffffffff
gggggggggggggg
hhhhhhhhhhhhhhhh
|jjjjjjjjjjjjjjjj
|kkkkkkkkkkkkkkkk
llllllllllllllll
Output:

Code:
aaasasdasdasdddsd
ddddddddddddddddffffffffffffffffff
gggggggggggggg
hhhhhhhhhhhhhhhhjjjjjjjjjjjjjjjjkkkkkkkkkkkkkkkk
llllllllllllllll
Note the extra blank line at the start .... grrrr

PS: now that's odd, there was no blank lines in my input file and 1 extra at the start of my output, but when I copy/pasted it, it's different ... hmmmmmmmmmmmm

Last edited by chrism01; 01-17-2008 at 12:07 AM.
 
Old 01-17-2008, 06:01 AM   #20
stevemcb
LQ Newbie
 
Registered: Apr 2006
Distribution: Suse 10.0
Posts: 14

Original Poster
Rep: Reputation: 0
Need new script for text file

Sorry, was away last night and didn't see the new posts. I'll try the perl script in a little while once I get my head screwed back on straight this morning.
 
Old 01-17-2008, 06:12 AM   #21
stevemcb
LQ Newbie
 
Registered: Apr 2006
Distribution: Suse 10.0
Posts: 14

Original Poster
Rep: Reputation: 0
Need script for text file

BTW, the way I have been editing the file (by hand) is the visually identify a line that starts with a pipe, put my cursor to the left of the pipe and hit the backspace key - which makes it part of the previous line (record).

That information just in case there was clarification needed.

HTH, and thanks for the help.
Stevemcb
 
Old 01-17-2008, 08:39 AM   #22
h/w
Senior Member
 
Registered: Mar 2003
Location: New York, NY
Distribution: Debian Testing
Posts: 1,286

Rep: Reputation: 46
Quote:
Originally Posted by stevemcb View Post
BTW, the way I have been editing the file (by hand) is the visually identify a line that starts with a pipe, put my cursor to the left of the pipe and hit the backspace key - which makes it part of the previous line (record).

That information just in case there was clarification needed.

HTH, and thanks for the help.
Stevemcb
So, that means the line starts with a space followed by a pipe. Which is probably why my earlier awk script failed, as it was looking for lines starting with pipes.
Try this mod:
Code:
awk 'BEGIN{nxt="";}{curr=$0;getline nxt;if(index(nxt, " |")== 1){print curr nxt;}else{print $0;}}' < inputfile > outputfile
 
Old 01-17-2008, 02:35 PM   #23
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
This works with your sample data.
Code:
$ awk -F'\n?\\|' '$1=$1' OFS='|' RS= uglydb
2003123|A15690195|3|N|1994-03-15 00:00:00|OPS$LSANDERS|SOUTHERN|COMPANY|64A PERIMETER CENTER EAST||ATLANTA|GA|US|30346|||REPLACEMENT IS DOA|REPLACE DOA REPLACEMENT|1993-12-16 00:00:00|1993-12-21 00:00:00|1993-12-21 00:00:00|1994-01-11 00:00:00|1994-03-15 00:00:00|N||THIS DOES NOT APPEAR TO BE A DUPLICATE OF CLAIM A14659942. PRIOR CLAIM WAS SERVICED BY NW COMPUTER SUPPORT IN WA STATE. SERIAL #'S MUST HAVE BEEN TYPED IN INCORRECTLY.|A|||CDR-74|||||0||||||||||||00000194.0013.0005
2001235|A15078491|3|N|1994-06-28 00:00:00|OPS$LSANDERS|NPPD||PO BOX 499||COLUMBUS|NE|US|68601|||DOA|REPALCED MONITOR|||||1994-03-15 00:00:00|N|Pending more than 60 days with no resolution; claim rejected|YELLOW STICKY ATTACHED TO CLAIM INDICATED THAT MONITOR WAS RETURNED ON MRA NUMBER #41801 ON 02/22/94.....SOP MRA POINTS TO CLAIM NUMBER #A15078478|R|||JC-1532VMA-2|||||||||||||||||00000194.0014.0005
Edit: Perl way if doing the same thing. Both of these are loading the whole file into ram. So if the db is HUGE then it might not be a good idea.
Code:
perl -0pe 's#\n\|#\|#g' uglydb
Another Edit: Just realized I forgot to add a 'g' at the end of the regex, worked with the example cause it was only 2 records.

Last edited by angrybanana; 01-17-2008 at 07:34 PM.
 
Old 01-17-2008, 04:46 PM   #24
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,356

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
angrybanana: That's why I designed mine not to load the whole file into memory.
h/w: as per my prev post, I think it's actually newline then pipe.
He's saying how he manually fixed it by backspacing to delete the newline.
 
Old 01-17-2008, 08:14 PM   #25
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Quote:
Originally Posted by chrism01 View Post
angrybanana: That's why I designed mine not to load the whole file into memory.
You're right, my answer wasn't good
Here's a better/corrected version of my awk code. (this needs gnu awk)
Code:
awk '$1=$1' RS='\n[^|]' uglydb
 
Old 01-18-2008, 05:29 AM   #26
stevemcb
LQ Newbie
 
Registered: Apr 2006
Distribution: Suse 10.0
Posts: 14

Original Poster
Rep: Reputation: 0
I took a fresh copy of the file and used the "awk 'BEGIN{nxt="";}{curr=$0;getline nxt;if(index(nxt, " |")== 1){print curr nxt;}else{print $0;}}' < inputfile > outputfile" on it.

It went from 308,000 records down to 190,000 records, and a quick visual scan of the results makes me think that it did, indeed, process all the records that started with a pipe. Now I need to go back to the guy who created the file to begin with and verify how many records were in the database to make sure I didn't lose any.

I'll get back to you as soon as I can.

Thanks,
Stevemcb
 
Old 01-18-2008, 05:55 AM   #27
h/w
Senior Member
 
Registered: Mar 2003
Location: New York, NY
Distribution: Debian Testing
Posts: 1,286

Rep: Reputation: 46
You could also check using:
wc -l inputfile
grep '^ |' inputfile

The difference between the two should give you the number.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Clean log bash script? QuarQuar Linux - General 4 10-27-2007 11:46 PM
NO C compiler after clean install + how add a script to startup mariogarcia Debian 4 06-10-2006 05:03 PM
Help with clean-up script fiservguy Programming 5 01-27-2005 12:59 AM
Simple script to clean up old file rbeckett Red Hat 2 09-09-2004 02:38 PM
How i can Clean up the log file of proxy? AZIMBD03 Red Hat 4 10-10-2003 08:27 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration