LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 07-28-2012, 01:19 PM   #1
porphyry5
Member
 
Registered: Jul 2010
Location: oregon usa
Distribution: Slackware 14.1, Arch, Lubuntu 18.04 OpenSUSE Leap 15.x
Posts: 518

Rep: Reputation: 24
grep shortest matches to regex


Thank you but never mind, found the solution. I needed
Code:
grep -Po "\((.*?)\)" <<< "$d"
Suppose
Code:
d='(grep this) don't grep this (but do grep this)'
and I want grep to return just the parts in parentheses, just one part to a line using "grep -o", i.e. as
Code:
(grep this)
(but do grep this)
What regex do I need to use here. With "grep -o", "grep -Eo" and "grep -Po" I have tried umpteen variations of '\(.*\)' '\(.?\)' '\(.+\)' both with and without \ quoting, and with and without all possible variants of \{-}. I get back either nothing, or the entire line, or each single character on its own line

Last edited by porphyry5; 07-28-2012 at 01:47 PM. Reason: Found solution
 
Old 07-29-2012, 05:02 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
When you use grep without -E, it uses basic regular expressions. In basic regex, the characters ?, +, {, |, (, and ) are considered literal. In gnu grep, prefixing these characters with a backslash enables their special meanings.

When you use -E, then it uses extended regular expressions, and the above characters are considered special by default. Backslashing them now disables their special meanings so that they become literal.

So in a nutshell, use -E if you need to use a lot of fancy regular expression features, and don't use it if you need to use a lot of literal characters like that.

See the grep man and info pages for more on the differences between basic and extended regex. sed works the same way with its -r option, BTW.

Incidentally, I personally prefer to surround characters that need to be literal in "[]" bracket expressions, rather using than backslashes. It's cleaner and more portable overall.


In any case your real problem isn't with grep, it's with the greediness of regex tokens like "*". They always capture the longest possible match. This means that '(.*)' will reach all the way to the final closing parentheses in the line.

The usual way to counter that is to use a negating bracket expression. Match everything that's not that character, until you find one that is. Like this:

Code:
grep -o '([^)]*)'
grep -Eo '[(][^)]+[)]'
The "+" in the second one ensures that the parentheses must actually contain something in order to match. Use "*" if you want to match empty ones.

Finally, as you appear to have discovered, perl-compatible regular expressions allow you to to disable greediness -- by appending the greedy token with a "?". So if you use the -P option, then your expression could look like this:

Code:
grep -Po '[(].*?[)]'
Note finally that "-P" and the backslashing of the above characters in basic regex are gnu extensions. they likely won't be available to you if you ever need to use a non-gnu version of grep.
 
1 members found this post helpful.
Old 07-29-2012, 07:25 PM   #3
porphyry5
Member
 
Registered: Jul 2010
Location: oregon usa
Distribution: Slackware 14.1, Arch, Lubuntu 18.04 OpenSUSE Leap 15.x
Posts: 518

Original Poster
Rep: Reputation: 24
Quote:
Originally Posted by David the H. View Post
Code:
grep -o '([^)]*)'
Thank you very much, that is an eye-opening way of approaching the greedy/non-greedy issue, much nicer solution, and
Code:
grep -o '([^)]\+)'
to ensure there is content in the parentheses. I prefer to use basic regexes whenever possible, don't get confused then by the differences in nomenclature.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Regex: word^ matches what? lesca Programming 9 02-07-2011 03:29 PM
[SOLVED] awk ordering of regex matches dazdaz Programming 12 11-06-2010 02:08 AM
REGEX from array, returns no matches. dr_strangelove Linux - Server 2 06-23-2010 06:30 PM
grep regex . matches new lines?! lambchops468 Linux - Newbie 3 03-24-2008 09:19 PM
how to look for the shortest match using regex, bascially the opposite of .* new_2_unix Linux - Newbie 8 01-08-2008 09:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:56 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration