LinuxQuestions.org - [SOLVED] regex on \d versus [0-9]

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - regex on \d versus [0-9] (https://www.linuxquestions.org/questions/linux-newbie-8/regex-on-%5Cd-versus-%5B0-9%5D-4175601684/)

regex on \d versus [0-9]

#This grep regex produces no output using \d for digit:

Code:

$ ip addr | egrep -i "^\d{1}"

#This grep regex produces output using [0-9] for digits:

Code:

$ ip addr | egrep -i "^[0-9]{1}"

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1

2: enp3s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000

3: wlp2s0b1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000

4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000

5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000

What did I missed on the first egrep regex pattern?
Thank you.

\d is regex in Perl notation try:

Code:

ip addr | grep -P "^\d{1}"

Thank you.

Is there a list of extended regex special characters in man page somewhere besides "Pattern Matching" in man bash?

Not a man page but a web page, here: https://www.gnu.org/software/finduti...ar-Expressions

HTH

If you're talking about general POSIX regex and POSIX extended regex, then there is manual page for that. There is also a manual page for the Perl regular expressions.

Code:

man 7 regex

man perlre

Both should be on your system already. The Perl pattern matching is well worth becoming familiar with, not only is it very common (often known as PCRE or Perl-Compatible Regular Expressions) but it is very, very useful.

Quote:

Originally Posted by Turbocapitalist (Post 5683199)

If you're talking about general POSIX regex and POSIX extended regex, then there is manual page for that. There is also a manual page for the Perl regular expressions.

Code:

man 7 regex

man perlre

On centOS 7, minimal install, I could not find it.

Code:

># man -k regex

regexp_table (5)    - format of Postfix regular expression tables

Tie::Hash::NamedCapture (3pm) - Named regexp capture buffers

Which package provides man 7 regex?

you can always reach it here: https://linux.die.net/man/7/regex
the mentioned man page is part of the package manpages
http://packages.ubuntu.com/search?su...rchon=contents

Quote:

Originally Posted by pan64 (Post 5683587)

you can always reach it here: https://linux.die.net/man/7/regex
the mentioned man page is part of the package manpages
http://packages.ubuntu.com/search?su...rchon=contents

Code:

[root@Centos7-1024ram-minimal ~]# yum list all | grep -i regex

ant-apache-regexp.noarch                1.9.2-9.el7                    base    

boost-regex.i686                        1.53.0-26.el7                  base    

boost-regex.x86_64                      1.53.0-26.el7                  base    

perl-PPIx-Regexp.noarch                0.034-3.el7                    base    

perl-XML-RegExp.noarch                  0.04-2.el7                    base    

regexp.noarch                          1.5-13.el7                    base    

regexp-javadoc.noarch                  1.5-13.el7                    base

No regex.7.* package in yum repositories.

regex.7.gz is the man page itself, manpages is the name of the package - but it is on ubuntu. Looks like the name of the package on CentOS is: man-pages

Quote:

Originally Posted by pan64 (Post 5684055)

regex.7.gz is the man page itself, manpages is the name of the package - but it is on ubuntu. Looks like the name of the package on CentOS is: man-pages

Thank you.
Where did you find this information locally, on your computer?

The difference ... is ... Unicode. :)

Here's a paragraph that might be relevant, from Perl's perldoc perlre: (emphasis mine)

Quote:

Unlike most locales, which are specific to a language and country pair, Unicode classifies all the characters that are letters somewhere in the world as "\w". For example, your locale might not think that "LATIN SMALL LETTER ETH" is a letter (unless you happen to speak Icelandic), but Unicode does.

Similarly, all the characters that are decimal digits somewhere in the world will match "\d"; this is hundreds, not 10, possible matches. And some of those digits look like some of the 10 ASCII digits, but mean a different number, so a human could easily think a number is a different quantity than it really is. For example, "BENGALI DIGIT FOUR" (U+09EA) looks very much like an "ASCII DIGIT EIGHT" (U+0038). And, "\d+", may match strings of digits that are a mixture from different writing systems, creating a security issue. [...] The "/a"modifier can be used to force "\d" to match just the ASCII 0 through 9.

Perl's implementation of regular expressions is a de facto standard, duplicated by most other languages, therefore this discussion should be directly relevant.

See also a discussion of so-called "POSIX bracket expressions" (N.B. not "character classes" ...) e.g. here. Particularly, in this case, [:digit:].

Quote:

Originally Posted by sundialsvcs (Post 5684233)

The difference ... is ... Unicode. :)

Here's a paragraph that might be relevant, from Perl's perldoc perlre: (emphasis mine)

Perl's implementation of regular expressions is a de facto standard, duplicated by most other languages, therefore this discussion should be directly relevant.

See also a discussion of so-called "POSIX bracket expressions" (N.B. not "character classes" ...) e.g. here. Particularly, in this case, [:digit:].

Since perl is a defacto standard, does all commands
allow for perl regex?

Quote:

Originally Posted by fanoflq (Post 5684231)

Where did you find this information locally, on your computer?

for ubuntu and debian there is a package search page (what I linked), so that will easily tell you the name of the package. But obviously you can look around on your own pc.
for centos you can try here: https://www.centos.org/docs/5/html/y...-packages.html, but probably better to try here: http://rpm.pbone.net/index.php3/stat/2/simple/1