LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Need help with file parsing (http://www.linuxquestions.org/questions/programming-9/need-help-with-file-parsing-359625/)

BrianK 09-02-2005 05:06 PM

Need help with file parsing
 
I have a file that looks something like this:

blah blah
... lots of stuff ...
blah blah
FrameBegin 1
... more stuff ...
FrameEnd
.. more stuff ...

I need to get the "more stuff" (in bold) - it's several lines. If it's easier to have "FrameBegin" and "FrameEnd" included in the results, that's fine. Actually, there are 100 FrameBegin/FrameEnds that I will be pulling out into separate files, but if I know how to do one, I can figure out how to do them all. I have a feeling I can do it with something simple like sed and awk as opposed to writing a c++ program or perl script., but I'm having trouble.

Does anyone have an idea?

Thanks

druuna 09-02-2005 05:57 PM

Hi,

This could help (awk based):

Code:

#!/bin/bash

awk '
  # only match lines between FrameBegin and FrameEnd (included)
  /^FrameBegin/,/^FrameEnd/ {
    # If first field equals FrameBegin, create outputfile ($1 + $2)
    if ( $1 ~ /^FrameBegin/ ) {
      outputfile = $1 $2
    }
    # if line does not start with Frame, put it in appropriate outputfile
    if ( $1 !~ /^Frame/ ) {
      print $0 >> outputfile
    }
}
' $1

Save and execute as: progname infile

I took the liberty to create outputfilenames based on FrameBegin 1, FrameBegin 2 etc. (the number being the unsure thing....). It's there to give you an idea and can be changed to your liking.

The importent part is the /^FrameBegin/,/^FrameEnd/ part. This makes sure that only the Frames and wat is inside are targeted.

You can do something simular with sed: sed -n '/^FrameBegin/,/^FrameEnd/p' infile.

Hope this clears things up a bit.

BrianK 09-02-2005 05:58 PM

Quote:

Originally posted by druuna

The importent part is the /^FrameBegin/,/^FrameEnd/ part. This makes sure that only the Frames and wat is inside are targeted.

You can do something simular with sed: sed -n '/^FrameBegin/,/^FrameEnd/' infile.

Hope this clears things up a bit.

That was exactly what I was looking for - didn't know about the use of the comma in a regexp.

Thanks for the code too. :D


All times are GMT -5. The time now is 10:56 PM.