LinuxQuestions.org - split a file based on column value awk / sed?

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - split a file based on column value awk / sed? (https://www.linuxquestions.org/questions/linux-newbie-8/split-a-file-based-on-column-value-awk-sed-4175466077/)

cusvenus

06-14-2013 06:42 PM

split a file based on column value awk / sed?

I have a file with some data and one of it is a long integer value say column $4, I want to sort that column and based on a range split the file.

EX:- 100 - 5000 (where x = 100 and y = 5000)
Next file is like

X= 5001 and y=10000

Can someone please help?

Thanks in advance

chrism01

06-14-2013 06:49 PM

It'd be a bit easier if you showed few lines of the file and also explain a bit more about how you want it split.
For a sort numeric on field 4 try

Code:

sort -k4 -n file

http://linux.die.net/man/1/sort

cusvenus

06-14-2013 06:54 PM

Thanks Chris.

Below is the data.

t1,e1,l1,r1,t2,s1
137,597,LG1,520000,Group 1-1,true
1370,8,JBC,40000,Group 1-1,false
137,597,LG1,2110000,Group 1-1,true
1370,8,JBC,800000,Group 1-1,false
137,597,LG1,210000,Group 1-1,true
1370,8,JBC,2000,Group 1-1,false
137,597,LG1,20800,Group 1-1,true
1370,8,JBC,2808000,Group 1-1,false
137,597,LG1,20700,Group 1-1,true
1370,8,JBC,2803000,Group 1-1,false
137,597,LG1,20400,Group 1-1,true
1370,8,JBC,28010,Group 1-1,false

say if I have 4th column sorted and want to split based on every 5 lines from that number

sed 1d test.csv | sort -k4 -n

1370,8,JBC,2000,Group 1-1,false
1370,8,JBC,28010,Group 1-1,false
1370,8,JBC,2803000,Group 1-1,false
1370,8,JBC,2808000,Group 1-1,false
1370,8,JBC,40000,Group 1-1,false
1370,8,JBC,800000,Group 1-1,false
137,597,LG1,20400,Group 1-1,true
137,597,LG1,20700,Group 1-1,true
137,597,LG1,20800,Group 1-1,true
137,597,LG1,210000,Group 1-1,true
137,597,LG1,2110000,Group 1-1,true
137,597,LG1,520000,Group 1-1,true

grail

06-15-2013 03:13 AM

Split? Into files? Into arrays? With a gap in the output?

Firerat

06-15-2013 03:36 AM

Quote:

Originally Posted by cusvenus (Post 4972151)

Thanks Chris.

Below is the data.

..snip..

say if I have 4th column sorted and want to split based on every 5 lines from that number

so you want to sort column 4 and then have a new file for every 6 lines?

This might be a coincidence , but looking at your sorted data.. I see a pattern

Code:

1370,8,JBC,2000,Group 1-1,false

1370,8,JBC,28010,Group 1-1,false

1370,8,JBC,2803000,Group 1-1,false

1370,8,JBC,2808000,Group 1-1,false

1370,8,JBC,40000,Group 1-1,false

1370,8,JBC,800000,Group 1-1,false

137,597,LG1,20400,Group 1-1,true

137,597,LG1,20700,Group 1-1,true

137,597,LG1,20800,Group 1-1,true

137,597,LG1,210000,Group 1-1,true

137,597,LG1,2110000,Group 1-1,true

137,597,LG1,520000,Group 1-1,true

Would you want new files based on fields 1, 2 and 3?

edit:
this would do that

Code:

sed 1d test.csv | sort -k4 -n | awk -F, '{print >> $1"-"$2"-"$3".csv"}'

No wait, that data is *not* sorted by column 4..

Firerat

06-15-2013 04:16 AM

here is is correctly sorted

Code:

sed 1d test.csv | sort -t, -k4 -n | awk -F, 'NR%5==1{File="File"++i".csv";}{print > File}'

will give you files containing 5 lines

But I'm not certain that is what you want

All times are GMT -5. The time now is 02:28 AM.