LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Separating a file by two tokens (https://www.linuxquestions.org/questions/linux-newbie-8/separating-a-file-by-two-tokens-4175487147/)

SuzuBell 12-06-2013 06:47 PM

Separating a file by two tokens
 
Hello Linux Gurus:

I have an Open Office Document.

In this file, many times it has a "start token line" that starts with the word "MODEL", followed by a number of spaces, followed by a number (For example: MODEL 1, MODEL 2, MODEL 3, MODEL 4, MODEL 5, .....etc).

After each "start token line", there are many lines that end with an "end token line" that consists only of the word "ENDMDL".

I would like to parse the file so that it grabs all lines starting from (and including) the "start token line" and ending at (and including) the "end token line" into a new output file.

In other words, if I ran this on a file with 100 of these "start token line" and "end token line", I would like to produce 100 files.

Any suggestions would be appreciated! I have parsed a file, but not tried a double token approach before...

syg00 12-06-2013 07:22 PM

perl or awk sound best - set a flag when you find the MODEL line and set/increment a filecounter variable. Write to that filecounter until ENDMDL at which time you reset the flag. An awk one-liner should be pretty quick.

SuzuBell 12-07-2013 12:00 PM

Thanks!
 
Thank you. I have it working with:

awk '/MODEL/ {flag=1;next} /ENDMDL/{flag=0} flag {print}' 1KZS.pdb > TEST

However, It is just printing all lines between MODEL-->ENDMDL into the *same* output file. But I would like to separate it so that each MODEL-->ENDMDL is in a new output file.

Is there a way my awk command can be tweaked to accomplish this?

vl23 12-07-2013 12:46 PM

counter=0
for line in `yourawkscripthere`;
do
echo $line >OUTPUTFILENASENAME$counter;
counter=((counter+1));
done

That should do what I believe you want to do.

syg00 12-07-2013 06:38 PM

I thought you wanted to include the MODEL/ENDMDL - easy enough done.

Similar to @vl23, you can direct the print in awk - have a look at this; note the use of a counter as I suggested above
Code:

awk '/MODEL/ {flag=1;cntr++;next} /ENDMDL/{flag=0} flag {print > "file"cntr}' 1KZS.pdb


All times are GMT -5. The time now is 08:19 PM.