ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a long file, which looks similar to the one I pasted below.
I would like to remove all lines between the lines "-----" and "_____" - I wrote there "remove this text".
In other words, I have to use the shortest match, and cut everything out between "-----" and "______" (their length can vary). It's OK if these marking lines get removed, too.
Anyone has awk ideas for that?
File to be edited:
normal text
don't touch
------------
Remove
this
text
____________________
another normal text
normal text
don't touch
------------------
Remove me
please
__________________________
Sorry for suspecting, (ignore me if I'm wrong) but... have to ? I hope it wasn't about a schoolwork.
I would have picked up sed myself, for a start anyway.
Yes, have to (actually, I was thinking if "have to" is the right expression before I started this thread). And no, not a schoolwork.
And yes, all the answers are unfortunately *wrong* (perhaps I didn't specify the "test case" clear enough).
What I want to do is to remove all advertisements from a mbox file of some mailing list.
These advertisements are placed between ----- and _____ - so far, everything clear.
The problem is it is a mbox file (or, emails one after another) - so there are sometimes nice drawings etc.
And hence I was looking for a way to remove the shortest match (shortest match is not the longest match; shortest match is also not a match longer then the shortest).
That being said, take a look at this "improved" test case:
1 normal text 1
1 don't touch 1
------------
Remove
this
text
____________________
2 normal text 2
a nice diagram:
--------------------------
| This will be gone, too |
| but should stay |
--------------------------
2 normal text 2
2 don't touch 2
------------------
Remove me
please
__________________________
3 yet another normal text 3
3 normal text 3
3 don't touch 3
With all suggested solutions in this thread, "normal text 2" would not look like we would like to - we would cut not the longest match, but also not the shortest between any two ----- and _______.
Sorry, but I have to interject. All the answers were not wrong, they were correct but the original question appears in retrospect to have been wrong. Welcome to the world of scope creep. The original question had a sort of elegance that made it easy meat.
So how exactly will this bit below be differentiated from the normal delete candidate onset offset patterns?
Quote:
-------------------------
| This will be gone, too |
| but should stay |
--------------------------
I can't see anything that would make it anything other than potentially dead meat at the moment.
[...]
With all suggested solutions in this thread, "normal text 2" would not look like we would like to - we would cut not the longest match, but also not the shortest between any two ----- and _______.
Could you have more than one occurrence of --- something ___ in the same file?
I mean:
Code:
---
a
b
c
___
something else
---
a
b
___
Where it's the second block (the shortest) which is supposed to be removed.
Could you have more than one occurrence of --- something ___ in the same file?
I mean:
Code:
---
a
b
c
___
something else
---
a
b
___
Where it's the second block (the shortest) which is supposed to be removed.
It's the mbox file (a file containing many emails) - so yes, I have several thousands occurrences.
I was thinking of possible easier solutions:
1) cut 4 or 5 lines above ^_________
2) reverse all lines in the file - I think I don't have any tables or drawings which use _______ - and then reverse lines back
But anyway, this find *really* shortest match seems to be more interesting and useful (think of HTML / XML tags).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.