Separating a file by two tokens
Hello Linux Gurus:
I have an Open Office Document. In this file, many times it has a "start token line" that starts with the word "MODEL", followed by a number of spaces, followed by a number (For example: MODEL 1, MODEL 2, MODEL 3, MODEL 4, MODEL 5, .....etc). After each "start token line", there are many lines that end with an "end token line" that consists only of the word "ENDMDL". I would like to parse the file so that it grabs all lines starting from (and including) the "start token line" and ending at (and including) the "end token line" into a new output file. In other words, if I ran this on a file with 100 of these "start token line" and "end token line", I would like to produce 100 files. Any suggestions would be appreciated! I have parsed a file, but not tried a double token approach before... |
perl or awk sound best - set a flag when you find the MODEL line and set/increment a filecounter variable. Write to that filecounter until ENDMDL at which time you reset the flag. An awk one-liner should be pretty quick.
|
Thanks!
Thank you. I have it working with:
awk '/MODEL/ {flag=1;next} /ENDMDL/{flag=0} flag {print}' 1KZS.pdb > TEST However, It is just printing all lines between MODEL-->ENDMDL into the *same* output file. But I would like to separate it so that each MODEL-->ENDMDL is in a new output file. Is there a way my awk command can be tweaked to accomplish this? |
counter=0
for line in `yourawkscripthere`; do echo $line >OUTPUTFILENASENAME$counter; counter=((counter+1)); done That should do what I believe you want to do. |
I thought you wanted to include the MODEL/ENDMDL - easy enough done.
Similar to @vl23, you can direct the print in awk - have a look at this; note the use of a counter as I suggested above Code:
awk '/MODEL/ {flag=1;cntr++;next} /ENDMDL/{flag=0} flag {print > "file"cntr}' 1KZS.pdb |
All times are GMT -5. The time now is 08:19 PM. |