LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   awk - remove lines between AAAA and BBBB (https://www.linuxquestions.org/questions/programming-9/awk-remove-lines-between-aaaa-and-bbbb-601433/)

mangoo 11-21-2007 08:32 AM

awk - remove lines between AAAA and BBBB
 
I have a long file, which looks similar to the one I pasted below.
I would like to remove all lines between the lines "-----" and "_____" - I wrote there "remove this text".

In other words, I have to use the shortest match, and cut everything out between "-----" and "______" (their length can vary). It's OK if these marking lines get removed, too.

Anyone has awk ideas for that?



File to be edited:

normal text
don't touch

------------
Remove
this
text
____________________

another normal text
normal text
don't touch


------------------
Remove me
please
__________________________

yet another normal text
normal text
don't touch

radoulov 11-21-2007 09:07 AM

Code:

awk '/^___/{f=0}f{next}/^---/{f=1}1'
or:

Code:

awk '/^(---|___)/{print}/^---/,/^___/{next}1'
If you want to remove --- ___:

Code:

awk '/^___/{f=0;next}f{next}/^---/{f=1;next}1'
Or just :)

Code:

awk '/^---/,/^___/{next}1'
Or with sed:

Code:

sed '/^---/,/^___/d'

colucix 11-21-2007 09:19 AM

radoulov, I would have told
Code:

awk '/^----/,/^____/{next}{print}'
but the numeric value for "true" is a stroke of genius! :)

b0uncer 11-21-2007 09:20 AM

Quote:

In other words, I have to use the shortest match,--
Sorry for suspecting, (ignore me if I'm wrong) but... have to ? I hope it wasn't about a schoolwork.

I would have picked up sed myself, for a start anyway.

pixellany 11-21-2007 09:27 AM

Why AWK when you can SED ???

sed '/--/,/__/ d' filename > newfilename

Deletes everything starting with "--" up to and including "__". I arbitrarily use two of each character.

If this WAS homework, then shame on me for doing it for you.....

mangoo 11-21-2007 10:43 AM

Quote:

Originally Posted by b0uncer (Post 2966551)
Sorry for suspecting, (ignore me if I'm wrong) but... have to ? I hope it wasn't about a schoolwork.

I would have picked up sed myself, for a start anyway.

Yes, have to (actually, I was thinking if "have to" is the right expression before I started this thread). And no, not a schoolwork.

And yes, all the answers are unfortunately *wrong* ;) (perhaps I didn't specify the "test case" clear enough).
What I want to do is to remove all advertisements from a mbox file of some mailing list.

These advertisements are placed between ----- and _____ - so far, everything clear.

The problem is it is a mbox file (or, emails one after another) - so there are sometimes nice drawings etc.

And hence I was looking for a way to remove the shortest match (shortest match is not the longest match; shortest match is also not a match longer then the shortest).

That being said, take a look at this "improved" test case:

1 normal text 1
1 don't touch 1

------------
Remove
this
text
____________________



2 normal text 2
a nice diagram:
--------------------------
| This will be gone, too |
| but should stay |
--------------------------
2 normal text 2
2 don't touch 2


------------------
Remove me
please
__________________________

3 yet another normal text 3
3 normal text 3
3 don't touch 3



With all suggested solutions in this thread, "normal text 2" would not look like we would like to - we would cut not the longest match, but also not the shortest between any two ----- and _______.

PAix 11-21-2007 11:25 AM

Sorry, but I have to interject. All the answers were not wrong, they were correct but the original question appears in retrospect to have been wrong. Welcome to the world of scope creep. The original question had a sort of elegance that made it easy meat.

So how exactly will this bit below be differentiated from the normal delete candidate onset offset patterns?
Quote:

-------------------------
| This will be gone, too |
| but should stay |
--------------------------
I can't see anything that would make it anything other than potentially dead meat at the moment.

PAix

radoulov 11-21-2007 12:47 PM

Quote:

Originally Posted by mangoo (Post 2966614)
[...]
With all suggested solutions in this thread, "normal text 2" would not look like we would like to - we would cut not the longest match, but also not the shortest between any two ----- and _______.

Could you have more than one occurrence of --- something ___ in the same file?
I mean:

Code:

---
a
b
c
___

something else

---
a
b
___

Where it's the second block (the shortest) which is supposed to be removed.

mangoo 11-21-2007 12:58 PM

Quote:

Originally Posted by radoulov (Post 2966734)
Could you have more than one occurrence of --- something ___ in the same file?
I mean:

Code:

---
a
b
c
___

something else

---
a
b
___

Where it's the second block (the shortest) which is supposed to be removed.


It's the mbox file (a file containing many emails) - so yes, I have several thousands occurrences.

I was thinking of possible easier solutions:

1) cut 4 or 5 lines above ^_________
2) reverse all lines in the file - I think I don't have any tables or drawings which use _______ - and then reverse lines back

But anyway, this find *really* shortest match seems to be more interesting and useful (think of HTML / XML tags).

radoulov 11-21-2007 01:41 PM

You mean something like this:
Code:

tac filename|awk '/^___/,/^---/{next}1'|tac
or:

Code:

tac <(awk '/^___/,/^---/{next}1'<(tac filename))
Code:

tac <(sed '/^___/,/^---/d'<(tac filename))

mangoo 11-21-2007 03:18 PM

Thanks a lot for all your answers.

Here are also some ideas from comp.lang.awk group:

http://groups.google.com/group/comp....a81536e6734e7e

radoulov 11-21-2007 04:24 PM

Another possible solution:

Code:

awk 'NR == FNR && /^-+$/ {
        f = FNR
}
NR == FNR && /^_+$/ {
        for(i=f; i<=FNR; i++)
                x[i]
        }
NR > FNR && !(FNR in x)
' filename filename


philip.patlur 07-07-2011 01:32 AM

I have slightly different problem

I need to strip out anthing thats between =+=+=+= and =+=+=+= in a file

colucix 07-07-2011 01:55 AM

Code:

sed '/=+=+=+=/,/=+=+=+=/d' file
Is this what you're looking for? If not, please show an example of input and the desired output.


All times are GMT -5. The time now is 05:59 PM.