Quote:
Originally Posted by grob115
Hello,
I need to create a regular expression to match a specific substring but ignore another substring. For example, I want to match a sentence containing the word "myMessage" as long as the sentence doesn't have the word "ignored". Here is an example sentence:
1) Dump: myMessage is to be picked
2) Dump: myMessage is to be ignored
In this case, 1) should be matched and 2) should be ignored. Is this possible with regular expression?
|
You could do it ugly way:
Code:
cat test.txt |egrep 'myMessage.*(([^i][^g][^n][^o][^r][^e][^d])|picked)$'
But please notice that this regexp assumes that "picked" is at the end of line and it is not elegant solution. Also notice that "[^i][^g][^n][^o][^r][^e][^d]" can't be used
alone (without "|picked)" part) to filter out "ignored" words, because it will also remove words like "index", etc. - i.e. those that have at least one letter in common with "ignored".
The less-uggly way will be:
Code:
cat test.txt |egrep 'myMessage.*(([^ignored]{7})|picked)$'
But it will remove any 7-letter word that contains letters from "ignored", and it doesn't precisely remove "ignored".
It looks like grep doesn't have elegant NOT operation when it comes words. You can exclude letters from certain character set, you can search for word OR another word (
egrep 'myMessage.*(picked|ignored)"will match lines with picked or ignored, but nothing else), but there is no operator for excluding word or pattern.