Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to
LinuxQuestions.org , a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free.
Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please
contact us . If you need to reset your password,
click here .
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
11-12-2009, 12:42 AM
#1
LQ Newbie
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16
Rep:
split multi line record into multiple files with awk
Hi I have a large file 'NS0923.csv' with data like the following. There are two records in this multi-record sample.
Code:
E60898,4578910,03/06/09,BEN BOYD RD,61,82,,,127,3,,52000.3046.001,3155,4.00,,PLT,1356,1.00,05/06/09,Y,Y,0551
,,,,,,,,,,,,4057,1.00,CLEAN CAR SHARE SIGN,LAB,0551,1.00,,,,
,,,,,,,,,,,,,,,LAB,3065,1.00,,,,
,,,,,,,,,,,,,,,MAT,PSTD,4.00,,,,
E60897,4575328,03/06/09,BEN BOYD LANE,62,78,,,127,3,,52000.3046.001,3155,1.00,,PLT,1356,0.50,05/06/09,Y,Y,0551
,,,,,,,,,,,,,,,LAB,0551,0.50,,,,
,,,,,,,,,,,,,,,LAB,3065,0.50,,,,
,,,,,,,,,,,,,,,MAT,PSTD,1.00,,,,
I need to create three outputs from the above file.
1. I have come up with the following awk script.
Code:
gawk 'BEGIN {FS=OFS=","}
{
if ( $1 ~ /E[0-9]+(?:\.[0-9]*)?/ )
{
print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$19,$20,$21,$22
}
}' NS0923.csv > processed.csv
that produces:
Code:
E60898,4578910,03/06/09,BEN BOYD RD,61,82,,,127,3,,52000.3046.001,05/06/09,Y,Y,0551
E60897,4575328,03/06/09,BEN BOYD LANE,62,78,,,127,3,,52000.3046.001,05/06/09,Y,Y,0551
comments/suggestions welcome
2. I still have to create a file 'transaction.csv' that should retrieve data from $13 - $15 with the identifying column $1.
Required output:
Code:
E60898,4057,1.00,CLEAN CAR SHARE SIGN
3. And finally another file 'quantity.csv'. Retrieving data from $16 - $18 with identifier $1.
Required output:
Code:
E60898,PLT,1356,1.00
E60898,LAB,0551,1.00
E60898,LAB,3065,1.00
E60898,MAT,PSTD,4.00
E60897,PLT,1356,0.50
E60897,LAB,0551,0.50
E60897,LAB,3065,0.50
E60897,MAT,PSTD,1.00
Thanks in advance
Last edited by pcock; 11-12-2009 at 05:25 AM .
Reason: SOLVED
11-12-2009, 01:00 AM
#2
Guru
Registered: Aug 2004
Location: Brisbane
Distribution: Centos 5.7, Solaris 10
Posts: 10,994
The basic principle is simple in Perl
Code:
# split, retrieving only specified fields
$var1 = "asd,,,,z,x1,,c2";
@arr = (split(/,/, $var1))[0, 5];
print "@arr\n";
asd x1
Just specify the field nums you need for each output.
11-12-2009, 01:10 AM
#3
Senior Member
Registered: Aug 2006
Posts: 2,695
ok, so you are not asking a question right? that's how you print your columns in awk. if you want, you can also use a for loop,
Code:
...
for(i=1;i<=12;i++){ print $i }
for(i=19;i<=22;i++){ print $i }
...
11-12-2009, 01:17 AM
#4
LQ Newbie
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16
Original Poster
Rep:
I'm afraid I do. I want to find out how can I produce
Code:
E60898,4057,1.00,CLEAN CAR SHARE SIGN
and
Code:
E60898,PLT,1356,1.00
E60898,LAB,0551,1.00
E60898,LAB,3065,1.00
E60898,MAT,PSTD,4.00
E60897,PLT,1356,0.50
E60897,LAB,0551,0.50
E60897,LAB,3065,0.50
E60897,MAT,PSTD,1.00
11-12-2009, 01:25 AM
#5
Senior Member
Registered: Aug 2006
Posts: 2,695
just use the same method of printing as you had done at 1) ? or does the data actually have a newline?
11-12-2009, 03:28 AM
#6
LQ Newbie
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16
Original Poster
Rep:
I can only use $1 for the first line, how do I use the $1 from the first line for lines 2,3 & 4 say. Then for the next set of records I'll have to do the same. I am hoping that there's a way to do this in awk.
Thanks.
11-12-2009, 03:47 AM
#7
Senior Member
Registered: Aug 2006
Posts: 2,695
you save the first $1 into a variable
Last edited by ghostdog74; 11-12-2009 at 03:48 AM .
11-12-2009, 03:52 AM
#8
LQ Newbie
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16
Original Poster
Rep:
sorry for the confussion.
not all lines start with EXXXX.
yes there are multiple lines of the same record.
11-12-2009, 04:17 AM
#9
Senior Member
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Puppy
Posts: 3,016
Rep:
well, if you RTFM you can see that you can use
the
print directly to a file.
Code:
gawk 'BEGIN {FS=OFS=","}
{
if ( $1 ~ /E[0-9]+(?:\.[0-9]*)?/ )
{
print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$19,$20,$21,$22
print $1,$13,$15 > "transvestite.csv"
print $1,$16,$18 > "quantitty.csv"
}
}' NS0923.csv > processed.csv
simples
11-12-2009, 04:54 AM
#10
LQ Newbie
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16
Original Poster
Rep:
Okay, let me try this again - It's a multiline file.
record1,foo,hello,world
aa,bb,cc
xx,yy,zz
record2,bar,hello,world
dd,ee,ff
uu,vv,ww
I want a way to output this.
record1,aa,bb,cc
record1,xx,yy,zz
record2,dd,ee,ff
record2,uu,vv,ww
11-12-2009, 05:12 AM
#11
Senior Member
Registered: Aug 2006
Posts: 2,695
Code:
$ awk -F"," '/record/{s=$1;next}{print s,$0}' OFS="," file
record1,aa,bb,cc
record1,xx,yy,zz
record2,dd,ee,ff
record2,uu,vv,ww
Thread Tools
Search this Thread
Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
All times are GMT -5. The time now is 01:33 PM .
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know .
Latest Threads
LQ News