LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-09-2013, 05:08 PM   #1
captainentropy
Member
 
Registered: Mar 2010
Location: Berkeley
Distribution: Ubuntu, Mint, CentOS
Posts: 81

Rep: Reputation: 0
Split file upon increments of string value


I have a large file that contains thousand of records. Each record begins with a specific string. I'd like to split the file into many smaller files but not one output file per record, maybe 5 or 10, or whatever.

I'm using this right now to split the file:

Code:
awk '/STRING/{n++}{print >"out" n ".txt" }' input_file.txt
And it works fine...if I want a thousand or more files.

How can I have awk split the file at every 10th instance of "STRING"? I tried adding an NR variable, but that was a mess.

Note, the records aren't the same size, so I can't just split based on number of lines.
 
Old 08-09-2013, 05:44 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Try to compute the output file name based on the value of n, i.e.
Code:
awk '/STRING/{n++} n%5{file = sprintf("out%03d.txt",n/5+1)}{print > file }' input_file.txt
The d specifier in the sprintf format ensures that the result of the division n/5 is an integer, hence for the first 4 records the result is 0, for the record from 5 to 9 the result is 1 and so on. Add one (as in my example) to start the file count from 1.
In addition I used the condition
Code:
n % 5
to avoid the change of name at the 5th, 10th, 15th records and so on, so that every file contains exactly 5 records (otherwise the 5th record would go to the new file). Hope this helps.
 
1 members found this post helpful.
Old 08-09-2013, 06:03 PM   #3
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
would the split command work ?
 
Old 08-09-2013, 06:04 PM   #4
captainentropy
Member
 
Registered: Mar 2010
Location: Berkeley
Distribution: Ubuntu, Mint, CentOS
Posts: 81

Original Poster
Rep: Reputation: 0
Thanks colucix, it worked perfectly! I never would have figured that out.
 
Old 08-09-2013, 06:21 PM   #5
captainentropy
Member
 
Registered: Mar 2010
Location: Berkeley
Distribution: Ubuntu, Mint, CentOS
Posts: 81

Original Poster
Rep: Reputation: 0
schneidz, as I understand the man page for split, I can only split into files of equal size (bytes or lines). If my records were of equal length I would have used that. Split was my first thought too.
 
Old 08-09-2013, 11:14 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Quote:
If my records were of equal length I would have used that.
I am not sure I follow?? If you use the awk you are splitting on each fifth consecutive line so could you not tell split to work on 5 lines at a time?
 
Old 08-12-2013, 07:20 PM   #7
captainentropy
Member
 
Registered: Mar 2010
Location: Berkeley
Distribution: Ubuntu, Mint, CentOS
Posts: 81

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
I am not sure I follow?? If you use the awk you are splitting on each fifth consecutive line so could you not tell split to work on 5 lines at a time?
What I was saying is that split only works by splitting into discreet sizes (e.g. every 5, 10, 67 or whatever lines, or every 2kb, etc.).

My file contains lots of records where each record is a different length. One record might be 5 lines but the next could be 17, or 85, etc. Using
Code:
split -l 5 file.txt prefix
results in each file having 5 lines which cuts in the middle parts of each record (or wherever the 5 lines land). Split can't work for this type of file. colucix's code worked perfectly.
 
  


Reply

Tags
awk, split



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how can I split a file into many files using a string in awk or sed atjurhs Linux - Newbie 15 06-11-2013 11:45 PM
split file based on number of string apperance mcbenus Programming 10 12-24-2009 06:44 PM
split string on file delimeter Jeroen1000 Programming 7 10-05-2009 08:35 AM
[perl]How to treat string like "a b" as a single string when split? john.daker Programming 21 06-01-2009 05:57 PM
how do I split large file by string? khairil Programming 5 04-28-2008 10:37 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration