What's the difference between \d , [:digit:], and [0-9] in regular expression ?

915086731 · 08-29-2011, 03:02 AM

Hello,

Code:

[river@localhost ate]$ [[ "123" =~ \d ]] && echo "ok" || echo "error";
error
[river@localhost ate]$ [[ "123" =~ [:digit:] ]] && echo "ok" || echo "error";
error
[river@localhost ate]$ [[ "123" =~ [0-9] ]] && echo "ok" || echo "error";
ok
[river@localhost ate]$

It seems that \d , [:digit:], and [0-9] are not the same.According to the regular expression reference, \d , [:digit:], and [0-9] have the same meaning, which represent a digit, but why not them work on linux?

Code:

[river@localhost ate]$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error";
error

I am very puzzled for the above, "123" should match \b[0-9]{3}\b, but why it not ?
Thanks!

grail · 08-29-2011, 04:36 AM

Where are you reading the information that all of these should work in bash? Also I think you might want to look up character classes, your digit example, to see their proper use.

915086731 · 08-29-2011, 08:18 PM

Thanks , as [0-9] works, but why "\b[0-9]{3}\b" does not work ?

Code:

[river@localhost ate]$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error";
error

grail · 08-29-2011, 08:40 PM

I find the best way to use regexes in bash is to assign them to a variable first, I believe this helps to not worry
about escape sequences. Hence:

Code:

reg='\b[0-9]{3}\b'

[[ "123" =~ $reg ]] && echo "ok" || echo "error"

kurumi · 08-29-2011, 08:46 PM

Quote:

Originally Posted by 915086731

Thanks , as [0-9] works, but why "\b[0-9]{3}\b" does not work ?

Code:

[river@localhost ate]$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error";
error

you using Fedora?

Diantre · 08-30-2011, 01:21 AM

Quote:

Originally Posted by 915086731

Code:

[[ "123" =~ \d ]] && echo "ok" || echo "error";

It seems Bash doesn't understand \d as other regex engines do. In Bash, \d is a literal "d", not a decimal number. I checked the man pages and couldn't find any reference to \d being interpreted as a decimal number. I could be wrong on that, though...

Quote:

Originally Posted by 915086731

Code:

[[ "123" =~ [:digit:] ]] && echo "ok" || echo "error";

You have to put [:digit:] inside a character class:

Code:

[[ "123" =~ [[:digit:]] ]] && echo "ok" || echo "error";

Quote:

Originally Posted by 915086731

Code:

[[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error";

Here Bash interprets \b as a literal "b". If you try this it will output "ok":

Code:

[[ "b123b" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error";

I'm guessing that you're trying to use \b as a word boundary assertion. Word boundaries work fine in grep with either \b or \< and \>. In PCRE \b also works fine as a word boundary. But I can get it to work in Bash, so I'm beginning to think that either Bash doesn't support \b or I'm doing something wrong. Probably the latter.

Have a look at the manpages for further information: grep(1), regex(7), bash(1) and pcre(3).

Hope that helps.

915086731 · 08-30-2011, 08:11 PM

Thanks very much! They are the best answer .

gnashley · 08-31-2011, 02:54 AM

Single quotes around the expression, fellows:

Code:

[[ "123" =~ '\b[0-9]{3}\b' ]] && echo "ok" || echo "error";
ok

Diantre · 08-31-2011, 01:16 PM

Thanks gnashley, but even with single quotes I still get "error" with Bash 4.1.

MTK358 · 08-31-2011, 01:27 PM

I thought that you shouldn't put quotes around the regexp when using bash's "=~" syntax.

Diantre · 08-31-2011, 04:08 PM

You can quote any part of the pattern to force a string match, according to the Bash manpage. But I think you're right MTK358, with single quotes, it seems Bash is matching a literal string.

Code:

$ [[ "123" =~ '\b[0-9]{3}\b' ]] && echo "ok" || echo "error";
error

$ [[ "\b[0-9]{3}\b" =~ '\b[0-9]{3}\b' ]] && echo "ok" || echo "error";
ok

$ [[ "123" =~ '[0-9]{3}' ]] && echo "ok" || echo "error";
error

$ [[ "123" =~ [0-9]{3} ]] && echo "ok" || echo "error";
ok

grail · 08-31-2011, 07:56 PM

@Diantre - Did you try my solution from post #4?

Diantre · 08-31-2011, 10:48 PM

Quote:

Originally Posted by grail

@Diantre - Did you try my solution from post #4?

Yes, thanks! Your solution works fine. It's just that I'm baffled why it doesn't work the other way, with the regex inside the test, that's all. Putting the regex in a variable, as you suggest, does the trick.

grail · 09-01-2011, 01:17 AM

If you look on Greg's Wiki at the following page :- http://mywiki.wooledge.org/BashGuide/Patterns

You will find the following:

Quote:

Be aware that regex parsing in BASH has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes.

Diantre · 09-01-2011, 02:24 AM

Quote:

Originally Posted by grail

If you look on Greg's Wiki at the following page...

Ahhh! Ok, so that explains it! Thanks for the heads-up grail.