[SOLVED] Split file upon increments of string value
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a large file that contains thousand of records. Each record begins with a specific string. I'd like to split the file into many smaller files but not one output file per record, maybe 5 or 10, or whatever.
I'm using this right now to split the file:
Code:
awk '/STRING/{n++}{print >"out" n ".txt" }' input_file.txt
And it works fine...if I want a thousand or more files.
How can I have awk split the file at every 10th instance of "STRING"? I tried adding an NR variable, but that was a mess.
Note, the records aren't the same size, so I can't just split based on number of lines.
The d specifier in the sprintf format ensures that the result of the division n/5 is an integer, hence for the first 4 records the result is 0, for the record from 5 to 9 the result is 1 and so on. Add one (as in my example) to start the file count from 1.
In addition I used the condition
Code:
n % 5
to avoid the change of name at the 5th, 10th, 15th records and so on, so that every file contains exactly 5 records (otherwise the 5th record would go to the new file). Hope this helps.
schneidz, as I understand the man page for split, I can only split into files of equal size (bytes or lines). If my records were of equal length I would have used that. Split was my first thought too.
I am not sure I follow?? If you use the awk you are splitting on each fifth consecutive line so could you not tell split to work on 5 lines at a time?
What I was saying is that split only works by splitting into discreet sizes (e.g. every 5, 10, 67 or whatever lines, or every 2kb, etc.).
My file contains lots of records where each record is a different length. One record might be 5 lines but the next could be 17, or 85, etc. Using
Code:
split -l 5 file.txt prefix
results in each file having 5 lines which cuts in the middle parts of each record (or wherever the 5 lines land). Split can't work for this type of file. colucix's code worked perfectly.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.