LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 11-12-2009, 12:42 AM   #1
pcock
LQ Newbie
 
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16

Rep: Reputation: 0
split multi line record into multiple files with awk


Hi I have a large file 'NS0923.csv' with data like the following. There are two records in this multi-record sample.


Code:
E60898,4578910,03/06/09,BEN BOYD RD,61,82,,,127,3,,52000.3046.001,3155,4.00,,PLT,1356,1.00,05/06/09,Y,Y,0551
,,,,,,,,,,,,4057,1.00,CLEAN CAR SHARE SIGN,LAB,0551,1.00,,,,
,,,,,,,,,,,,,,,LAB,3065,1.00,,,,
,,,,,,,,,,,,,,,MAT,PSTD,4.00,,,,
E60897,4575328,03/06/09,BEN BOYD LANE,62,78,,,127,3,,52000.3046.001,3155,1.00,,PLT,1356,0.50,05/06/09,Y,Y,0551
,,,,,,,,,,,,,,,LAB,0551,0.50,,,,
,,,,,,,,,,,,,,,LAB,3065,0.50,,,,
,,,,,,,,,,,,,,,MAT,PSTD,1.00,,,,
I need to create three outputs from the above file.

1. I have come up with the following awk script.

Code:
gawk   'BEGIN {FS=OFS=","}
{
 if ( $1 ~ /E[0-9]+(?:\.[0-9]*)?/ )
   {
       print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$19,$20,$21,$22
   }

}' NS0923.csv > processed.csv
that produces:

Code:
E60898,4578910,03/06/09,BEN BOYD RD,61,82,,,127,3,,52000.3046.001,05/06/09,Y,Y,0551
E60897,4575328,03/06/09,BEN BOYD LANE,62,78,,,127,3,,52000.3046.001,05/06/09,Y,Y,0551
comments/suggestions welcome

2. I still have to create a file 'transaction.csv' that should retrieve data from $13 - $15 with the identifying column $1.

Required output:

Code:
E60898,4057,1.00,CLEAN CAR SHARE SIGN

3. And finally another file 'quantity.csv'. Retrieving data from $16 - $18 with identifier $1.

Required output:

Code:
E60898,PLT,1356,1.00
E60898,LAB,0551,1.00
E60898,LAB,3065,1.00
E60898,MAT,PSTD,4.00
E60897,PLT,1356,0.50
E60897,LAB,0551,0.50
E60897,LAB,3065,0.50
E60897,MAT,PSTD,1.00
Thanks in advance

Last edited by pcock; 11-12-2009 at 05:25 AM. Reason: SOLVED
 
Old 11-12-2009, 01:00 AM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,324

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
The basic principle is simple in Perl
Code:
# split, retrieving only specified fields
$var1 = "asd,,,,z,x1,,c2";
@arr = (split(/,/, $var1))[0, 5];
print "@arr\n";


asd x1
Just specify the field nums you need for each output.
 
Old 11-12-2009, 01:10 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
ok, so you are not asking a question right? that's how you print your columns in awk. if you want, you can also use a for loop,
Code:
...
  for(i=1;i<=12;i++){ print $i }
  for(i=19;i<=22;i++){ print $i }
...
 
Old 11-12-2009, 01:17 AM   #4
pcock
LQ Newbie
 
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16

Original Poster
Rep: Reputation: 0
I'm afraid I do. I want to find out how can I produce

Code:
E60898,4057,1.00,CLEAN CAR SHARE SIGN
and

Code:
E60898,PLT,1356,1.00
E60898,LAB,0551,1.00
E60898,LAB,3065,1.00
E60898,MAT,PSTD,4.00
E60897,PLT,1356,0.50
E60897,LAB,0551,0.50
E60897,LAB,3065,0.50
E60897,MAT,PSTD,1.00
 
Old 11-12-2009, 01:25 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
just use the same method of printing as you had done at 1) ? or does the data actually have a newline?
 
Old 11-12-2009, 03:28 AM   #6
pcock
LQ Newbie
 
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16

Original Poster
Rep: Reputation: 0
I can only use $1 for the first line, how do I use the $1 from the first line for lines 2,3 & 4 say. Then for the next set of records I'll have to do the same. I am hoping that there's a way to do this in awk.
Thanks.
 
Old 11-12-2009, 03:47 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
you save the first $1 into a variable

Last edited by ghostdog74; 11-12-2009 at 03:48 AM.
 
Old 11-12-2009, 03:52 AM   #8
pcock
LQ Newbie
 
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16

Original Poster
Rep: Reputation: 0
sorry for the confussion.

not all lines start with EXXXX.
yes there are multiple lines of the same record.
 
Old 11-12-2009, 04:17 AM   #9
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,314

Rep: Reputation: 175Reputation: 175
well, if you RTFM you can see that you can use
the print directly to a file.

Code:
gawk   'BEGIN {FS=OFS=","}
{
 if ( $1 ~ /E[0-9]+(?:\.[0-9]*)?/ )
    {
       print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$19,$20,$21,$22
       print $1,$13,$15 > "transvestite.csv"
       print $1,$16,$18 > "quantitty.csv"
   }

  }' NS0923.csv  > processed.csv
simples
 
Old 11-12-2009, 04:54 AM   #10
pcock
LQ Newbie
 
Registered: Sep 2004
Location: Sydney, Australia
Distribution: Archlinux
Posts: 16

Original Poster
Rep: Reputation: 0
Okay, let me try this again - It's a multiline file.

record1,foo,hello,world
aa,bb,cc
xx,yy,zz
record2,bar,hello,world
dd,ee,ff
uu,vv,ww


I want a way to output this.

record1,aa,bb,cc
record1,xx,yy,zz
record2,dd,ee,ff
record2,uu,vv,ww
 
Old 11-12-2009, 05:12 AM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
$ awk -F"," '/record/{s=$1;next}{print s,$0}' OFS="," file
record1,aa,bb,cc
record1,xx,yy,zz
record2,dd,ee,ff
record2,uu,vv,ww
 
  


Reply

Tags
awk, gawk, line, multi, multiline


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Split large file in several files using scripting (awk etc.) chipix Programming 14 10-29-2007 12:16 PM
sed/awk/grep for multiple line data hotrodmacman Programming 8 10-18-2007 12:06 PM
awk messing up trying to split a unicode line by whitespace hedpe Programming 1 08-05-2006 12:10 PM
is there a linux-supported archive format that can split into multiple files? Moebius Linux - Software 3 07-11-2006 03:52 PM
split files using awk (or similar) lgualteri Programming 1 06-13-2005 10:17 AM


All times are GMT -5. The time now is 09:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration