[SOLVED] awk split file into variable number of files
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need to separate the file into X number of files based on the number of unique records in $2, 8 in this example. I wrote a script that can do this with 8 versions of the following, but I would like a more simple solution if possible. I will have many files with different names to process.:
Code:
awk '$2 ~ /chr1\y/{print $0 > "chr1"}' input_file
How can I write the code to be one line so that the search value is incremented and then at each change in $2 those records are written to a file containing that line (which will always be chr[whatever], where whatever can be from 1-19 or X, Y, or M). I hope my question is clear. Thanks for any help you can offer.
sorry, I thought my example was clear. The results would be 8 files, and their names would be determined by the value in $2 ("chr1" has two lines of data, "chr9" has six lines of data, etc., but they could just as easily be 0 as they would be 1e6):
@pan64, that just creates an empty file.
@grail, yes, they will be sorted on $2 (sort -Vk2). Would it be possible to use an unsorted file? I'm not planning to, the script I have so far, one of the first steps is to sort, but it might be a useful tool to know.
Thanks!
Last edited by captainentropy; 07-06-2012 at 02:28 PM.
@pan64, I'm sorry, I made a mistake - your solution worked perfectly! (I did try it at 3am though, and I made a silly mistake). So simple a solution. And it named the files appropriately.
@bsat. Yours works too(as Alchemikos said). I made a modification, though (bold), so that it keeps the name more explicit to the content (e.g. "chr1", "chr15", etc.):
ok, the solution above worked for the example above but I have a new factor that I can't figure out. The files I'm creating are to be read by a program that requires a pair of "links" to be on two lines (in the previous example there was in reality a second link on each line that I eliminated for clarity). What is to become the second line (the link pair) was created with this:
The data are in the correct places but I can't get rid of or prevent the leading space from appearing on the second line of each link. I can remove them after they're split into separate files no problem, but I'm hoping there's a way I can do this in one step (and I can't figure it out).
Thanks for any help you might be able to offer.
Last edited by captainentropy; 07-10-2012 at 02:09 PM.
Reason: spelling
thanks grail! I didn't realize \n obviated the need for the following comma. There's a lot of jiujitsu I'm applying to my data files. I think I can simplify the code I'm writing a bit more but it's doing what I need now thanks to all the help here.
Just to clarify, when using 'print' anything separated by a comma will have OFS placed between each item. This means that '\n' or anything else, also nothing,
could be used. The nice part about the comma is if you have used a specific OFS and wish it between items, eg OFS="|" - this would place a pipe where ever
you enter a comma between items past to 'print'
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.