LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Extract lines NOT on a block of text from a file (http://www.linuxquestions.org/questions/programming-9/extract-lines-not-on-a-block-of-text-from-a-file-674410/)

Renan_S2 10-05-2008 02:49 PM

Extract lines NOT on a block of text from a file
 
Hello, I hope I'm not in the wrong forum.

I have a text file in the format:

Code:


START

blah blah blah, blah blah blah
...
...
...

END

Comments: .....

START

...
...
...
...
...

END

Comments: .....

Now I need a way to extract just the text that is NOT within the "START ... END" block. How would I do this?
I've tried searching a way of doing this with awk/sed, but didn't find it.

Don't know if I've managed to express myself properly...


Thanks.

nadroj 10-05-2008 03:31 PM

i found this link that you will be able to use to achieve what you need: http://student.northpark.edu/pemente/sed/sed1line.txt
if "START" and "END" are 'keywords' in whatever your doing (that is, they cannot appear exactly as "START" or "END" besides to denote the start and end of a 'block') then it will be straightforward. if they can appear in a 'block' but not at the beginning of a line then it will also be straightforward. if they can appear within a block then it will be much more difficult (i think).

i never use 'sed', but i am very familiar with regular expressions. i was able to write a simple regex expression, using the link above as a reference, to print what you need. note the '-n' sed command may or may not be needed, depending on the type of 'newline' delimeter being used (ie Unix vs Windows).

so lets think about what you need: you want to print all blocks that do not start with "START" and end with "END". this is equivalent to not printing blocks with lines that begin with "START" up to a line that beings with "END". try and make a regular expression in sed that prints the blocks that start with "START" up to lines that being with "END". if you have that, then you simply negate it, and do not print (rather than do print) these blocks--this is your answer.

for example, a simple regular expression for printing blocks that start with START up to lines that start with END would be: ^START.*^END

im just trying to explain to let you do it. give an attempt and if you cant get it ill post the answer.

Renan_S2 10-05-2008 03:38 PM

This does it, I think:

Code:

sed '/START/,/END/d'
Input:

Code:

START

1
2
3
4

END

Comment: foo

START

5
6
7
8

END

Comment: bar

Output:

Code:

Comment: foo


Comment: bar

Thanks.

nadroj 10-05-2008 04:14 PM

note that if the block between START and END also contains your keywords ("START", "END"), you will get unexpected output. however if these are in fact keywords and cannot be used in the blocks, then what you have should be fine.

glad to help


All times are GMT -5. The time now is 05:24 AM.