how can I split a file into many files using a string in awk or sed
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
and the file continues on this way for a very long length.
I would like to break up the one long file and get many smaller files. I think this could be done by using the begining and ending strings of each "group" to parse on
SRIG_NAME:
SRIG_END:
and the name of each of the new text files would be the string that follows
SRIG_NAME:
SRIG_END:
so in my example I'd have
FILENAME_TIG.txt
BSG_BSG.txt
CMP34_ADY.txt
can anybody help me?
I know a little awk and sed so I can follow along
Thanks so much guys! Tabitha
oh, and here's what I've already been working with
awk can redirect output to multiple files based on a value within the file.
A hint.
If a line starts with SRIG_NAME:, then all subsequent data (including the current line) gets written to the file with name as the 2nd argument in the line starting with SRIG_NAME. (FILENAME_TIG, BSG_BSG etc).
awk can redirect output to multiple files based on a value within the file.
A hint.
If a line starts with SRIG_NAME:, then all subsequent data (including the current line) gets written to the file with name as the 2nd argument in the line starting with SRIG_NAME. (FILENAME_TIG, BSG_BSG etc).
OK
Thanks AnanthaP for your reply!
yep, that's exactly the idea, and when it reads SRIG_END it ends writing lines for that "group" and starts again looking for the next SRIG_NAME
This creates individual files named srigfile_000.txt, etc.
See info csplit for details on how to use it properly.
PS: Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques. Thanks.
Last edited by David the H.; 06-06-2013 at 02:09 PM.
Oh, and here's a simple loop for renaming the files to the desired strings from the text.
Code:
for oname in srigfile_00*; do
read -r _ nname <"$oname"
mv "$oname" "$nname.txt"
done
This should work as long as the new name is the second space-delimited field on the first line of each file. But be sure that there aren't any duplicate names.
Last edited by David the H.; 06-06-2013 at 02:19 PM.
Reason: add a bit more
Oh, and here's a simple loop for renaming the files to the desired strings from the text.
Code:
for oname in srigfile_00*; do
read -r _ nname <"$oname"
mv "$oname" "$nname.txt"
done
This should work as long as the new name is the second space-delimited field on the first line of each file. But be sure that there aren't any duplicate names.
The csplit command worked great, thanks so much! it's output was 1765 files named srigfile_000.txt through file srigfile_1765.txt
The oname loop hasn't been as succesfull.
If I run it as a bash script with #!/bin/bash thinking that maybe the path to my bin is somehow messed up, the command line gves me back nothing and there is no change to the srigfile names.
If I run it without the #!/bin/bash the command line gves me back ./script: line 2: srigfile_00*: No such file or directory and the mv command of course says it cannot stat `srigfile_00*'
So I tried changing around the string of srigfile_00* but that had no effect either, it still can't find the srigfiles, and sometimes even deleted all the srigfiles, yikes!
I double checked the fields on the first line of each of the newly created srigfiles from the csplit command, and they are space delimeted, but I don't think this part of the script is getting accessed yet?
It's usually a good idea not to run a possibly-destructive command like mv until you've confirmed that it's configured correctly. The easiest way is to just stick an echo at the front of the command, then you'll see a printout of what would actually be executed after the variables are expanded.
I don't really see what could be wrong with what I posted though. It's just a simple globbing pattern and for loop.
Since you have many more files than what I used for testing, You'll probably need to shorten the glob to something like "srig*". Just keep it long enough to match only the files you want. "printf '%s\n' <glob>" can be used to list out all the files matched by that pattern, one per line.
The read command inside the loop just takes the first line from each file and splits it into two variables; the first word on the line goes into the throw-away "_", and all the rest into the nname variable, for use as the new filename.
Check to see that you haven't made any syntax or spelling errors. And of course the loop needs to be run in the same directory as the files, or else it would have to be made more complex. Make sure the new names don't have any illegal filename characters or other conflicts either, as I mentioned before.
I highly doubt there are any problems with your PATH or other low-level issue like that. If you haven't had any problems before, then they aren't likely to be a factor now. It's certainly either a syntax or matching error of some kind.
Also, another thought, could the files have dos-style line-endings in them? If so, you may need to run them through dos2unix or a similar converter first.
Both David's and the Firerat's sripts work David I apologize. I missed typed "nname" as just "name" in the move command. The output of echo pointed me to my errors.
you know it's funny how you can see a script and follow along and know what it's doing at each step, but know I can't write it myself. I get some of it and then get stuck, or I start off down the wrong road
Those little typos get you every time. In hindsight I probably should've used variable names that were a bit clearer to read, like "oldname" and "newname", instead of the shorter ones. I usually use "fname" and "dname" myself for files and directories, so I was keeping with the same pattern.
@Firerat, Nicely done. Just a couple of quick suggestions. "print" on its own is the same as "print $0", and an "else" would probably be a better choice to connect the two commands, rather than "next".
I believe you could also reduce it down to just this (untested, 'cause I'm lazy):
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.