Splitting files by pattern match
I have a series of files which have been concatenated together. Each of the files has a header, something like this:
metasyntactic_variables.txt Code:
header----- # primary Code:
$ wondersplit --patern-"^header-----" metasyntactic_variables.txt xa Code:
header----- # primary as the hills Code:
header----- # secondary Code:
header----- # quux family |
This can be solved with a faily simple gawk command:
gawk 'BEGIN{fnum=0; out="outf";} /^header----/ {fnum++;} {print $0 >> out""fnum}' <INPUT-FILE> essentially scan through outputing every line to the file "outf#" and increment # everytime you find the regexp ^header----. Hope This Helps |
I knew that someone would jump in with an awk script...
very nice. Thanks. |
Wow. That ran so fast I thought that it had failed, but I got output.
note to self: learn some awk. |
Just wanted to update this, I stumbled across a much better option today
the command csplit "content split" just a man csplit will show you how to use it. I feel kinda silly running to gawk when this option was available. |
the awk code is simply
Code:
awk '/header/{++d}{print $0>"file_"d}' file |
Quote:
If I'm not mistaken closing the files after having written to them fixes this though. Code:
awk '/header/{close("file_"d);++d}{print $0>"file_"d}' file |
Well... I'll be damned. I think that I used csplit about 10 years ago, and I totally forgot about it. How did you run across it?
|
I was in a situation without internet and was navigating around info pages looking for the pr -m or paste command to merge two files line by line for awk processing. When I got to the text-processing commands I noticed csplit and decided to check out what it did.
|
All times are GMT -5. The time now is 09:03 PM. |