LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 10-18-2008, 08:28 AM   #1
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,563
Blog Entries: 1

Rep: Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024
\{a,b\} regular expressions


Hi all,
How can I match exactly 2 occurences of zero. It's supposed to be:
0\{2\}, but as you can see it doesn't work as it supposed to. It seems that
it doesn't make any difference between 0\{2,\} - which should find 2 or more occurences of zero and 0\{2\} which is supposed to find exactly 2 occurences of zero.
What am I doing wrong? All the examples find at least 2 zeros. I tried using single quotes (the same results) and no quotes (doesn't work).

Code:
[xtd8865@centbox reg_exp]$ cat wl | grep "0\{2,\}"
100
1000
10000
100000
[xtd8865@centbox reg_exp]$ cat wl | grep "0\{2\}"
100
1000
10000
100000
[xtd8865@centbox reg_exp]$ cat wl | grep "0\{2,3\}"
100
1000
10000
100000
thanks
 
Old 10-18-2008, 08:41 AM   #2
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
I'll use egrep to dispense with the backslashes.

egrep '([^0]|^)00([^0]|$)' wl

This will also match ^00$.
 
Old 10-18-2008, 08:47 AM   #3
openSauce
Member
 
Registered: Oct 2007
Distribution: Fedora, openSUSE
Posts: 252

Rep: Reputation: 39
That looks like correct behaviour to me. All those lines do contain substrings which are exactly two zeroes - your regex doesn't say that such a substring can't be followed by more zeroes.

'[^0]00[^0]' will match strings of two zeroes with any character other than 0 before and after them.

What you want is probably '(^|[^0])00($|[^0])', which will also match strings of two zeroes at the beginning or end of a line. But note that to use the '|' character with grep, you need to turn on extended mode by calling egrep or using the -e switch. Or you can still use grep's basic mode, but escape the | with a backslash.

Code:
egrep '(^|[^0])00([^0]|$)' temp.txt
100
100b
 
Old 10-18-2008, 09:16 AM   #4
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,563
Blog Entries: 1

Original Poster
Rep: Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024
thanks guys for your examples. They work fine.

Quote:
That looks like correct behaviour to me. All those lines do contain substrings which are exactly two zeroes - your regex doesn't say that such a substring can't be followed by more zeroes.
Quote:
\{m,n\} Matches the preceding element at least m and not more than n times (from wikipedia)
man grep:
Quote:
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{n,m} The preceding item is matched at least n times, but not more than m times.
In my examples everything works as if it were {n,}.
If they just find a substring n and do not care what follows, then what is the point of introducing 3 syntactic variations of the same thing.

thanks
 
Old 10-18-2008, 09:24 AM   #5
openSauce
Member
 
Registered: Oct 2007
Distribution: Fedora, openSUSE
Posts: 252

Rep: Reputation: 39
In other situations, it would make a difference which one you used.

Code:
$ egrep 'ab{2}a' file
abba

$ egrep 'ab{2,}a' file
abba
abbba
abbbba

$ egrep 'ab{2,3}a' file
abba
abbba
 
Old 10-18-2008, 09:40 AM   #6
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,563
Blog Entries: 1

Original Poster
Rep: Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024
hmmm, it's getting a bit funny.
Is there anything peculiar in my situation that it won't work as it's supposed to?
Code:
[xtd8865@centbox reg_exp]$ cat wl | egrep '0{2,3}'
100
1000
10000
100000
[xtd8865@centbox reg_exp]$ cat wl | egrep '0{2}'
100
1000
10000
100000
[xtd8865@centbox reg_exp]$ cat wl | egrep '0{2,}'
100
1000
10000
100000
[xtd8865@centbox reg_exp]$
It's just a text file containing the following lines:
Code:
10
100
1000
10000
100000
 
Old 10-18-2008, 10:36 AM   #7
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by sycamorex View Post
hmmm, it's getting a bit funny.
Is there anything peculiar in my situation that it won't work as it's supposed to?
Code:
[xtd8865@centbox reg_exp]$ cat wl | egrep '0{2,3}'
If you want to match "00" placed on separate line, try
try egrep '^0{2,3}$'.

If you want with text that ends with "00" use egrep '0{2,3}$'.

If you want to match "00" in a word, try something like
egrep '[^0]+0{2,3}[^0]+'

Last edited by ErV; 10-18-2008 at 10:38 AM.
 
Old 10-18-2008, 12:42 PM   #8
openSauce
Member
 
Registered: Oct 2007
Distribution: Fedora, openSUSE
Posts: 252

Rep: Reputation: 39
Quote:
Originally Posted by sycamorex View Post
hmmm, it's getting a bit funny.
Is there anything peculiar in my situation that it won't work as it's supposed to?
But it is working as it's supposed to! What makes you think 0{2} doesn't match "10000"? It does, in fact it matches it 3 times, in the 2nd, 3rd, and 4th positions.

It might help to realise that:
  • 0{2} is equivalent to 00
  • 0{2,3} is equivalent to 000?, and also equivalent to (00)|(000)
  • 0{2,} is equivalent to 000*

does that make sense now? Do you see why my examples with abbba etc did what you expected, and yours didn't?

Regexes can take a bit of getting used to
 
Old 10-18-2008, 01:59 PM   #9
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,563
Blog Entries: 1

Original Poster
Rep: Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024
Fair enough. Thanks for the explanation. It has made the thing clearer to me.
 
Old 10-18-2008, 06:26 PM   #10
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
The "00" pattern is in "0000000" so you will have a match. In order to match exactly 2 zeroes, you need to take the neighbors that you allow into account. [] is used to produce a set of characters for a match. If the first character is ^, then the meaning is negated, and you want to match any character not in the set. [^0] means any character but "0". Outside the square brackets, ^ means the beginning of a line. $ means the end of a line. [^0]0{2,3}[^0] matches a non-zero character followed by 2 or 3 zeroes, followed by a non-zero character. That will match 4006 or 220005 but not 10000.
 
Old 10-18-2008, 06:38 PM   #11
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,563
Blog Entries: 1

Original Poster
Rep: Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024Reputation: 1024
I get it now. Initially I thought that by writing {2,3} it will automatically ensure that the pattern of 2-3 zeroes will not be followed by any other zero (or another match of 2-3 zeros). You live, you learn.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expressions ziggy25 Linux - Newbie 7 11-05-2007 06:57 AM
Regular expressions bhuwan Programming 5 02-25-2006 11:07 PM
Regular Expressions markjuggles Programming 2 05-05-2005 11:39 AM
Regular Expressions overbored Linux - Software 3 06-24-2004 02:34 PM
regular expressions? alaios Linux - General 2 06-11-2003 03:51 PM


All times are GMT -5. The time now is 07:54 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration