how to look for the shortest match using regex, bascially the opposite of .*
hi,
i'm have a problem in the following situation: suppose, i have a string "Scrapple from the apple." then, if i use the regular expression "a.*e", it will match: "apple from the apple", because by definition using the .* will match the longest string that will match the regular expression. and it won't match: "apple" OR "apple from the" even though these also start with "a" and end with an "e". my problem is that instead of looking for the longest match, i want the shortest match. i've looked at the tutorials, but am still at a loss on how to do this. any help will be much appreciated. |
Here is just one crude method (using sed):
sed 's/a.\{,3\}e/G/g' filename This matches the pattern: "a" + a maximum of 3 characters + "e", and replaces it with "G" for all occurences. |
You want the "non-greedy" matching operators. In perl, for example, if you used .+? it will match on the first character (beware with using .*? -- it will happily match on 0 characters and end).
|
here is an example of what i'm trying to do.
i'm trying to delete everything between and including <tag1> and </tag1>. but anything that's outside of this should not be deleted. i'm doing this with a sed script, and the regex is not working. [HTML] <html><body><tag1>This is inside tag1. This should be deleted.</tag1>This is the first statement outside of tag1. This should NOT be deleted.<tag1>This is once again inside tag1. This should be deleted as well.</tag1> This is the second statement outside tag1. This should NOT be deleted.</body></html> [/HTML] i've tried the following: in this one the problem is that it deletes the first line outside <tag1> as well. Code:
$cat test1 | sed 's/<tag1>.*<\/tag1>//' Code:
$cat test1 | sed 's/<tag1>.+?<\/tag1>//' |
What about using "s/(^.*<tag1>).*(</tag1>.*$)/\1\2/"? That will save everything up to and including <tag1> from the start of the line, and then save everything after and including </tag1>, to the end of the line, chopping the middle. An inelegant solution, I know, but something that may work until something better comes along.
|
In your first example, the regex is "greedy"--ie it goes all the way to the last instance of </tag1>.
In addition to my earlier crude solution (max # of characters), you could also do this: sed -e 's/\/tag1/TAGONE/' -e 's/<tag1>.*<TAGONE>//' (By replacing only the first instance of "/tag1" you create an unambiguous endpoint for the second SED command.) My favorite SED tutorial: http://www.grymoire.com/Unix/Sed.html Quote:
|
Quote:
|
This works:
Code:
sed 's/[0-9]\{2\},\+\?[0-9]\{2\}/DDD/g' filename From all my reading, I had no idea that the construct in bold/underline would work. It seems that this has a very different meaning from the perl one. EDIT--PS: Works in grep, too... |
Quote:
Code:
awk 'BEGIN{ FS="</tag1>"} |
All times are GMT -5. The time now is 05:23 AM. |