Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I looked at locate command.
Man page:
locate [OPTION]... PATTERN...
If --regex is not specified, PATTERNs can contain globbing characters. If any PATTERN contains no globbing characters,
locate behaves as if the pattern were *PATTERN*.
-r, --regexp REGEXP
Search for a basic regexp REGEXP. No PATTERNs are allowed if this option is used, but this option can be specified multiple times.
Can you give examples that differentiate between regex and a PATTERN?
I thought regex is a PATTERN you want to match.
Can you give examples that differentiate between regex and a PATTERN?
I thought regex is a PATTERN you want to match.
No, a "regex", or regular expression is a very specific type of pattern.
A PATTERN as commonly found in man pages and other documentation can sometimes be a regular expression, or it can be any other type pattern recognized by a particular application.
As used in the locate man page, a PATTERN is any string of characters and may include globbing characters (another form of pattern), but...
Code:
-r, --regexp REGEXP
Search for a basic regexp REGEXP. No PATTERNs are allowed if this option is used
...makes it plain that in this case a PATTERN and a regex are to be considered as two different and mutually exclusive things.
In general, regex, regexp, regular expression always mean specifically regular expressions. PATTERN, and pattern mean whatever the person writing the man page decides that it means, but most often it will include globbing patterns (see man 7 glob).
You have to be very clear on the differences between globbing and regex when using -r with locate.
Maybe the info page would be better - it allows you to link to the regex page.
Edit: was a bit slow replying and our posts overlapped. Do a search on "globbing vs regex".
I did not see any reference in locate man page to glob about PATTERN.
How did you know where to look?
By the way, I notice ? in glob is not the same a ? in regex. Correct?
Seems like a lot of inconsistency!
Although the man page does not provide a reference for globbing patterns, it does say...
Code:
If --regex is not specified, PATTERNs can contain globbing characters.
"Globbing" is just one of those common terms in the *nix world that everyone assumes everyone else knows, so they do not always explain. I suspected that you may not yet know what it was so I mentioned the man 7 glob page... now you know!
It isn't actually any inconsistency, it is alternate ways of expressing patterns. In this case, locate supports two types - regex and globbing - and each of those assigns slightly different meaning to the same characters.
Globbing is most common is shell and filesystem contexts whereas regular expressions are a more universal and more powerful way of specifying textual patterns.
As astrogeek points out, the man page for glob explains what the metacharacters like '*' mean and the fact that globbing is not the same as regexes.
Just to make life interesting, I'd also add that different tools like awk, grep, perl etc have slightly different regex engines.
If you really want to learn about regexes, you can't go past http://regex.info/book.html
Regex:
locate -r '.md$'
produces files ending with '.' plus md
example result is readme.md
to get same result, these globbing does not work here (zero result):
locate ?md
locate ?.md
but this works: locate md (meaning it returns something......)
likewise this regex does not work (zero result).
locate -r '.?md$'
locate md will return all the files containing md anywhere inside, like: /home/pan/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/share/locale/he/LC_MESSAGES locate ?md will do something similar, but in this case you added a glob. If you carefully check the man page of locate you will see the difference:
Code:
If --regex is not specified, PATTERNs can contain globbing characters. If any PATTERN contains no globbing char‐
acters, locate behaves as if the pattern were *PATTERN*.
So in the first example locate used *md*, in the second ?md locate ?.md is similar again (missed * from the beginning) locate -r '.?md$' uses regexp and you missed .* from the beginning
you missed: if no glob was specified * will be added at the beginning at the end. if glob was specified, nothing will be added. So ?md should match not a substring but a full name. Using only md *md* will be applied.
if no glob was specified * will be added at the beginning at the end. if glob was specified, nothing will be added. So ?md should match not a substring but a full name
I was missing some fundamental interpretation on glob here.
But I think I understand what glob does now. Thank you.
From man 7 glob:
NOTES
Regular expressions
Note that wildcard patterns are not regular expressions, although they are a bit similar.
First of all, they match filenames, rather than text, and secondly, the conventions are not the same: for example, in a regular expression '*' means zero or more copies of the preceding thing.
In case of using something like this, ?md, glob will ONLY match pathname with EXACTLY three characters, '?', 'm', and 'd'. It is not a substring match when using ?, it is not regex!
--------------------- --------------------
One more question. Again, from man 7 glob page:
Pathnames
Globbing is applied on each of the components of a pathname separately. A '/' in a pathname cannot be matched by a '?' or '*' wildcard, or by a range like "[.-0]". A range cannot contain an explicit '/' character; this would lead to a syntax error.
This globbing appears to contradict the above Pathname globbing description.
~ $ locate ??????user1
/home/user1
since the forward slashes, /, were matched too.
The wildcard characters '??????' matches '/home/'.
And this match of '/' contradicts the statement:
"A '/' in a pathname cannot be matched by a '?' "
So again, I must be missing some understanding here.
I was missing some fundamental interpretation on glob here.
But I think I understand what glob does now. Thank you.
...
Yes, that looks correct! You are getting the hang of it now!
Quote:
Originally Posted by fanoflq
One more question. Again, from man 7 glob page:
Pathnames
Globbing is applied on each of the components of a pathname separately. A '/' in a pathname cannot be matched by a '?' or '*' wildcard, or by a range like "[.-0]". A range cannot contain an explicit '/' character; this would lead to a syntax error.
This globbing appears to contradict the above Pathname globbing description.
~ $ locate ??????user1
/home/user1
since the forward slashes, /, were matched too.
The wildcard characters '??????' matches '/home/'.
And this match of '/' contradicts the statement:
"A '/' in a pathname cannot be matched by a '?' "
So again, I must be missing some understanding here.
That is an interesting case that appears to contradict, but does not (I think) because of the contexts.
For example:
Code:
ls ?home?user1
ls /?????user1
Do not match, but...
ls /????/user1
...does
...as they are globbing a pathname on the filesystem, and work as indicated in the man page, but...
Code:
locate ??????user1
...does produce a match because it is globbing a text string in the locate database, not a filesystem pathname.
(At least that is my first best guess, but perhaps someone more knowledgable can provide a better explanation.)
*** UPDATE ***
Yes, the above seems to be correct.
What the man page is telling you is that when this seems not to work for pathnames, as in your example, it is because a pathname does not actually exist as a text string in the filesystem - it is composed of separate components in different locations (directory names) and the separator characters '/' do not actually appear in those strings but are generated by the process that assembles the components for display.
So, as the man page says, globbing must be "applied on each of the components of a pathname" separately.
But it works just as expected on contiguous text strings.
In other words, the man page is being very precise and anticipating a common apparent, but non-existent, contradiction. They do that a lot, which is why it usually takes a very precise and literal reading method.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.