When you use grep without
-E, it uses
basic regular expressions. In basic regex, the characters
?,
+,
{,
|,
(, and
) are considered literal. In
gnu grep, prefixing these characters with a backslash
enables their special meanings.
When you use
-E, then it uses
extended regular expressions, and the above characters are considered special by default. Backslashing them now
disables their special meanings so that they become literal.
So in a nutshell, use
-E if you need to use a lot of fancy regular expression features, and don't use it if you need to use a lot of literal characters like that.
See the
grep man and info pages for more on the differences between basic and extended regex.
sed works the same way with its
-r option, BTW.
Incidentally, I personally prefer to surround characters that need to be literal in "
[]" bracket expressions, rather using than backslashes. It's cleaner and more portable overall.
In any case your real problem isn't with
grep, it's with the
greediness of regex tokens like "
*". They always capture the
longest possible match. This means that '
(.*)' will reach all the way to the final closing parentheses in the line.
The usual way to counter that is to use a negating bracket expression. Match everything that's
not that character, until you find one that is. Like this:
Code:
grep -o '([^)]*)'
grep -Eo '[(][^)]+[)]'
The "
+" in the second one ensures that the parentheses must actually contain something in order to match. Use "
*" if you want to match empty ones.
Finally, as you appear to have discovered,
perl-compatible regular expressions allow you to to disable greediness -- by appending the greedy token with a "
?". So if you use the
-P option, then your expression could look like this:
Code:
grep -Po '[(].*?[)]'
Note finally that "
-P" and the backslashing of the above characters in basic regex are gnu extensions. they likely won't be available to you if you ever need to use a non-gnu version of
grep.