awk - remove lines between AAAA and BBBB
I have a long file, which looks similar to the one I pasted below.
I would like to remove all lines between the lines "-----" and "_____" - I wrote there "remove this text". In other words, I have to use the shortest match, and cut everything out between "-----" and "______" (their length can vary). It's OK if these marking lines get removed, too. Anyone has awk ideas for that? File to be edited: normal text don't touch ------------ Remove this text ____________________ another normal text normal text don't touch ------------------ Remove me please __________________________ yet another normal text normal text don't touch |
Code:
awk '/^___/{f=0}f{next}/^---/{f=1}1' Code:
awk '/^(---|___)/{print}/^---/,/^___/{next}1' Code:
awk '/^___/{f=0;next}f{next}/^---/{f=1;next}1' Code:
awk '/^---/,/^___/{next}1' Code:
sed '/^---/,/^___/d' |
radoulov, I would have told
Code:
awk '/^----/,/^____/{next}{print}' |
Quote:
I would have picked up sed myself, for a start anyway. |
Why AWK when you can SED ???
sed '/--/,/__/ d' filename > newfilename Deletes everything starting with "--" up to and including "__". I arbitrarily use two of each character. If this WAS homework, then shame on me for doing it for you..... |
Quote:
And yes, all the answers are unfortunately *wrong* ;) (perhaps I didn't specify the "test case" clear enough). What I want to do is to remove all advertisements from a mbox file of some mailing list. These advertisements are placed between ----- and _____ - so far, everything clear. The problem is it is a mbox file (or, emails one after another) - so there are sometimes nice drawings etc. And hence I was looking for a way to remove the shortest match (shortest match is not the longest match; shortest match is also not a match longer then the shortest). That being said, take a look at this "improved" test case: 1 normal text 1 1 don't touch 1 ------------ Remove this text ____________________ 2 normal text 2 a nice diagram: -------------------------- | This will be gone, too | | but should stay | -------------------------- 2 normal text 2 2 don't touch 2 ------------------ Remove me please __________________________ 3 yet another normal text 3 3 normal text 3 3 don't touch 3 With all suggested solutions in this thread, "normal text 2" would not look like we would like to - we would cut not the longest match, but also not the shortest between any two ----- and _______. |
Sorry, but I have to interject. All the answers were not wrong, they were correct but the original question appears in retrospect to have been wrong. Welcome to the world of scope creep. The original question had a sort of elegance that made it easy meat.
So how exactly will this bit below be differentiated from the normal delete candidate onset offset patterns? Quote:
PAix |
Quote:
I mean: Code:
--- |
Quote:
It's the mbox file (a file containing many emails) - so yes, I have several thousands occurrences. I was thinking of possible easier solutions: 1) cut 4 or 5 lines above ^_________ 2) reverse all lines in the file - I think I don't have any tables or drawings which use _______ - and then reverse lines back But anyway, this find *really* shortest match seems to be more interesting and useful (think of HTML / XML tags). |
You mean something like this:
Code:
tac filename|awk '/^___/,/^---/{next}1'|tac Code:
tac <(awk '/^___/,/^---/{next}1'<(tac filename)) Code:
tac <(sed '/^___/,/^---/d'<(tac filename)) |
Thanks a lot for all your answers.
Here are also some ideas from comp.lang.awk group: http://groups.google.com/group/comp....a81536e6734e7e |
Another possible solution:
Code:
awk 'NR == FNR && /^-+$/ { |
I have slightly different problem
I need to strip out anthing thats between =+=+=+= and =+=+=+= in a file |
Code:
sed '/=+=+=+=/,/=+=+=+=/d' file |
All times are GMT -5. The time now is 05:59 PM. |