how to look for the shortest match using regex, bascially the opposite of .*
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
how to look for the shortest match using regex, bascially the opposite of .*
hi,
i'm have a problem in the following situation:
suppose, i have a string "Scrapple from the apple."
then, if i use the regular expression "a.*e", it will match: "apple from the apple", because by definition using the .* will match the longest string that will match the regular expression.
and it won't match: "apple" OR "apple from the" even though these also start with "a" and end with an "e".
my problem is that instead of looking for the longest match, i want the shortest match. i've looked at the tutorials, but am still at a loss on how to do this. any help will be much appreciated.
You want the "non-greedy" matching operators. In perl, for example, if you used .+? it will match on the first character (beware with using .*? -- it will happily match on 0 characters and end).
here is an example of what i'm trying to do.
i'm trying to delete everything between and including <tag1> and </tag1>. but anything that's outside of this should not be deleted.
i'm doing this with a sed script, and the regex is not working.
[HTML]
<html><body><tag1>This is inside tag1. This should be deleted.</tag1>This is the first statement outside of tag1. This should NOT be deleted.<tag1>This is once again inside tag1. This should be deleted as well.</tag1> This is the second statement outside tag1. This should NOT be deleted.</body></html>
[/HTML]
i've tried the following:
in this one the problem is that it deletes the first line outside <tag1> as well.
Code:
$cat test1 | sed 's/<tag1>.*<\/tag1>//'
<html><body>This is the second statement outside tag1. This should not be deleted.</body></html>
in this one the problem is that it does not delete anything:
Code:
$cat test1 | sed 's/<tag1>.+?<\/tag1>//'
<html><body><tag1>This is inside tag1. This should be deleted.</tag1>This is the first statement outside tag1. This should not be deleted.<tag1>This is once again inside tag1. This should be deleted as well.</tag1>This is the second statement outside tag1. This should not be deleted.</body></html>
i've tried several variants of the regex above... but am still at a loss on how to do this... any guidance will be helpful. thanks!
What about using "s/(^.*<tag1>).*(</tag1>.*$)/\1\2/"? That will save everything up to and including <tag1> from the start of the line, and then save everything after and including </tag1>, to the end of the line, chopping the middle. An inelegant solution, I know, but something that may work until something better comes along.
In your first example, the regex is "greedy"--ie it goes all the way to the last instance of </tag1>.
In addition to my earlier crude solution (max # of characters), you could also do this:
sed -e 's/\/tag1/TAGONE/' -e 's/<tag1>.*<TAGONE>//' (By replacing only the first instance of "/tag1" you create an unambiguous endpoint for the second SED command.)
I don't know if this is legal--I have never seen in in Bash. "+" means one or more, and "?" means optional---I don't know what the combo would mean. Maybe try it on something simple.....
I don't know if this is legal--I have never seen in in Bash. "+" means one or more, and "?" means optional---I don't know what the combo would mean. Maybe try it on something simple.....
His question just mentioned regular expressions, context-unspecific. In perl (where I do 99% of my personal RegExp work) one can use the '?' operator to turn a greedy operator into a non-greedy one.
i'm doing this with a sed script, and the regex is not working.
Sometimes regex can be a pain, if you don't understand it enough. Until you get to know it better, here's one way without regex (at least not too much)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.