LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-06-2016, 01:28 AM   #1
fanoflq
Member
 
Registered: Nov 2015
Posts: 397

Rep: Reputation: Disabled
differences between PATTERN and regex


I looked at locate command.
Man page:
locate [OPTION]... PATTERN...

If --regex is not specified, PATTERNs can contain globbing characters. If any PATTERN contains no globbing characters,
locate behaves as if the pattern were *PATTERN*.

-r, --regexp REGEXP
Search for a basic regexp REGEXP. No PATTERNs are allowed if this option is used, but this option can be specified multiple times.

Can you give examples that differentiate between regex and a PATTERN?
I thought regex is a PATTERN you want to match.

Last edited by fanoflq; 03-06-2016 at 01:41 AM.
 
Old 03-06-2016, 01:49 AM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,264
Blog Entries: 24

Rep: Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195
Quote:
Originally Posted by fanoflq View Post
Can you give examples that differentiate between regex and a PATTERN?
I thought regex is a PATTERN you want to match.
No, a "regex", or regular expression is a very specific type of pattern.

A PATTERN as commonly found in man pages and other documentation can sometimes be a regular expression, or it can be any other type pattern recognized by a particular application.

As used in the locate man page, a PATTERN is any string of characters and may include globbing characters (another form of pattern), but...

Code:
-r, --regexp REGEXP
    Search for a basic regexp REGEXP. No PATTERNs are allowed if this option is used
...makes it plain that in this case a PATTERN and a regex are to be considered as two different and mutually exclusive things.

In general, regex, regexp, regular expression always mean specifically regular expressions. PATTERN, and pattern mean whatever the person writing the man page decides that it means, but most often it will include globbing patterns (see man 7 glob).

Last edited by astrogeek; 03-06-2016 at 01:51 AM.
 
1 members found this post helpful.
Old 03-06-2016, 01:55 AM   #3
Tonus
Senior Member
 
Registered: Jan 2007
Location: Paris, France
Distribution: Slackware-15.0
Posts: 1,405
Blog Entries: 3

Rep: Reputation: 514Reputation: 514Reputation: 514Reputation: 514Reputation: 514Reputation: 514
differences between PATTERN and regex

As I undertstand, here a pattern means a text string. And with -r no text string is admitted but regexp.
 
Old 03-06-2016, 02:11 AM   #4
fanoflq
Member
 
Registered: Nov 2015
Posts: 397

Original Poster
Rep: Reputation: Disabled
astrogeek:

I did not see any reference in locate man page to glob about PATTERN.
How did you know where to look?

By the way, I notice ? in glob is not the same a ? in regex. Correct?
Seems like a lot of inconsistency!

Thanks.
 
Old 03-06-2016, 02:17 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
You have to be very clear on the differences between globbing and regex when using -r with locate.
Maybe the info page would be better - it allows you to link to the regex page.

Edit: was a bit slow replying and our posts overlapped. Do a search on "globbing vs regex".

Last edited by syg00; 03-06-2016 at 02:20 AM.
 
Old 03-06-2016, 02:23 AM   #6
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,264
Blog Entries: 24

Rep: Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195
Quote:
Originally Posted by fanoflq View Post
astrogeek:

I did not see any reference in locate man page to glob about PATTERN.
How did you know where to look?

By the way, I notice ? in glob is not the same a ? in regex. Correct?
Seems like a lot of inconsistency!
Although the man page does not provide a reference for globbing patterns, it does say...

Code:
If --regex is not specified, PATTERNs can contain globbing characters.
"Globbing" is just one of those common terms in the *nix world that everyone assumes everyone else knows, so they do not always explain. I suspected that you may not yet know what it was so I mentioned the man 7 glob page... now you know!

It isn't actually any inconsistency, it is alternate ways of expressing patterns. In this case, locate supports two types - regex and globbing - and each of those assigns slightly different meaning to the same characters.

Globbing is most common is shell and filesystem contexts whereas regular expressions are a more universal and more powerful way of specifying textual patterns.

Last edited by astrogeek; 03-06-2016 at 02:36 AM.
 
Old 03-06-2016, 02:46 AM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
As astrogeek points out, the man page for glob explains what the metacharacters like '*' mean and the fact that globbing is not the same as regexes.

Just to make life interesting, I'd also add that different tools like awk, grep, perl etc have slightly different regex engines.
If you really want to learn about regexes, you can't go past http://regex.info/book.html
 
Old 03-06-2016, 04:52 AM   #8
fanoflq
Member
 
Registered: Nov 2015
Posts: 397

Original Poster
Rep: Reputation: Disabled
Regex:
locate -r '.md$'
produces files ending with '.' plus md
example result is readme.md

to get same result, these globbing does not work here (zero result):
locate ?md
locate ?.md
but this works: locate md (meaning it returns something......)

likewise this regex does not work (zero result).
locate -r '.?md$'

What did I missed for those that did not work?

Thanks.

Last edited by fanoflq; 03-06-2016 at 04:54 AM.
 
Old 03-06-2016, 05:18 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
locate md will return all the files containing md anywhere inside, like: /home/pan/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/share/locale/he/LC_MESSAGES
locate ?md will do something similar, but in this case you added a glob. If you carefully check the man page of locate you will see the difference:
Code:
 If --regex is not specified, PATTERNs can contain globbing characters.  If any PATTERN contains no globbing char‐
       acters, locate behaves as if the pattern were *PATTERN*.
So in the first example locate used *md*, in the second ?md
locate ?.md is similar again (missed * from the beginning)
locate -r '.?md$' uses regexp and you missed .* from the beginning
 
Old 03-06-2016, 12:32 PM   #10
fanoflq
Member
 
Registered: Nov 2015
Posts: 397

Original Poster
Rep: Reputation: Disabled
pan64:

Thank you.

Quote:
locate ?md will do something similar, but in this case you added a glob
From man 7 glob:
A '?' (not between brackets) matches any single character.

so I expect this match of substring "amd":
/home/pan/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/share/locale/he/LC_MESSAGES

but it is not matching anything.
What did I missed?
 
Old 03-06-2016, 12:38 PM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
you missed: if no glob was specified * will be added at the beginning at the end. if glob was specified, nothing will be added. So ?md should match not a substring but a full name. Using only md *md* will be applied.
 
1 members found this post helpful.
Old 03-06-2016, 01:49 PM   #12
fanoflq
Member
 
Registered: Nov 2015
Posts: 397

Original Poster
Rep: Reputation: Disabled
pan64:

Quote:
if no glob was specified * will be added at the beginning at the end. if glob was specified, nothing will be added. So ?md should match not a substring but a full name
I was missing some fundamental interpretation on glob here.

But I think I understand what glob does now. Thank you.

From man 7 glob:
NOTES
Regular expressions
Note that wildcard patterns are not regular expressions, although they are a bit similar.
First of all, they match filenames, rather than text, and secondly, the conventions are not the same: for example, in a regular expression '*' means zero or more copies of the preceding thing.

In case of using something like this, ?md, glob will ONLY match pathname with EXACTLY three characters, '?', 'm', and 'd'. It is not a substring match when using ?, it is not regex!

--------------------- --------------------
One more question. Again, from man 7 glob page:
Pathnames
Globbing is applied on each of the components of a pathname separately. A '/' in a pathname cannot be matched by a '?' or '*' wildcard, or by a range like "[.-0]". A range cannot contain an explicit '/' character; this would lead to a syntax error.

This globbing appears to contradict the above Pathname globbing description.
~ $ locate ??????user1
/home/user1

since the forward slashes, /, were matched too.
The wildcard characters '??????' matches '/home/'.
And this match of '/' contradicts the statement:
"A '/' in a pathname cannot be matched by a '?' "

So again, I must be missing some understanding here.

Thank you for your patience!

Last edited by fanoflq; 03-06-2016 at 01:52 PM.
 
Old 03-06-2016, 02:53 PM   #13
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,264
Blog Entries: 24

Rep: Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195
Quote:
Originally Posted by fanoflq View Post
pan64:

I was missing some fundamental interpretation on glob here.

But I think I understand what glob does now. Thank you.

...
Yes, that looks correct! You are getting the hang of it now!

Quote:
Originally Posted by fanoflq View Post
One more question. Again, from man 7 glob page:
Pathnames
Globbing is applied on each of the components of a pathname separately. A '/' in a pathname cannot be matched by a '?' or '*' wildcard, or by a range like "[.-0]". A range cannot contain an explicit '/' character; this would lead to a syntax error.

This globbing appears to contradict the above Pathname globbing description.
~ $ locate ??????user1
/home/user1

since the forward slashes, /, were matched too.
The wildcard characters '??????' matches '/home/'.
And this match of '/' contradicts the statement:
"A '/' in a pathname cannot be matched by a '?' "

So again, I must be missing some understanding here.
That is an interesting case that appears to contradict, but does not (I think) because of the contexts.

For example:

Code:
ls ?home?user1
ls /?????user1

Do not match, but...

ls /????/user1

...does
...as they are globbing a pathname on the filesystem, and work as indicated in the man page, but...

Code:
locate ??????user1
...does produce a match because it is globbing a text string in the locate database, not a filesystem pathname.

(At least that is my first best guess, but perhaps someone more knowledgable can provide a better explanation.)

*** UPDATE ***

Yes, the above seems to be correct.

What the man page is telling you is that when this seems not to work for pathnames, as in your example, it is because a pathname does not actually exist as a text string in the filesystem - it is composed of separate components in different locations (directory names) and the separator characters '/' do not actually appear in those strings but are generated by the process that assembles the components for display.

So, as the man page says, globbing must be "applied on each of the components of a pathname" separately.

But it works just as expected on contiguous text strings.

In other words, the man page is being very precise and anticipating a common apparent, but non-existent, contradiction. They do that a lot, which is why it usually takes a very precise and literal reading method.

Last edited by astrogeek; 03-06-2016 at 03:14 PM.
 
1 members found this post helpful.
Old 03-06-2016, 03:55 PM   #14
fanoflq
Member
 
Registered: Nov 2015
Posts: 397

Original Poster
Rep: Reputation: Disabled
Thank you all....
I have learnt a whole lot....
 
Old 03-06-2016, 04:24 PM   #15
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,264
Blog Entries: 24

Rep: Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195
You are welcome, and good luck!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Pattern matching... Is there a clever RegEx? danielbmartin Programming 9 02-20-2015 11:57 AM
[SOLVED] Improving perl regex pattern. sysfce2 Programming 7 06-10-2012 01:24 PM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM
Help with a specific regex pattern in Perl mrwall-e Programming 13 02-28-2011 10:26 AM
Bash - add Pattern in Front of Regex danifunny Programming 5 11-04-2010 10:46 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration