LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-06-2006, 06:15 AM   #1
m4a1rifle
LQ Newbie
 
Registered: Dec 2006
Posts: 5

Rep: Reputation: 0
Regular expression help. "?"


Hi, im new to linux. During learning about unix regular expression i come to this question:

file?.txt

i need to give 3 example that would match the filename pattern. i manage to think of 2
which is: filee.txt and file.txt

is there a third match at all?
 
Old 12-06-2006, 06:59 AM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,373

Rep: Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962
? means any character at all. i could give you at least 50 other examples using letters alone...
 
Old 12-06-2006, 08:19 AM   #3
m4a1rifle
LQ Newbie
 
Registered: Dec 2006
Posts: 5

Original Poster
Rep: Reputation: 0
isn't ? in reg exp represents zero or one occurrences of the regular expression pattern that precedes it?
 
Old 12-06-2006, 10:36 AM   #4
kees-jan
Member
 
Registered: Sep 2004
Distribution: Debian, Ubuntu, BeatrIX, OpenWRT
Posts: 273

Rep: Reputation: 30
acid_kewpie: You would be right if he asked about (shell-type) globbing, instead of regular expressions.

m4airifle: Yes, there are a whole bunch of extra matches. Try reading up on regular expressions some more, instead of asking homework questions ;-)
 
Old 12-06-2006, 10:57 AM   #5
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 62
Quote:
Originally Posted by m4a1rifle
isn't ? in reg exp represents zero or one occurrences of the regular expression pattern that precedes it?
Yes, but when you specify patterns on the command line they are not regular expression patters - they are a much simpler shell "glob" patterns where ? means any single character. See the "Pattern Matching" section of the bash manual page for more information.
 
Old 12-06-2006, 11:15 AM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
Originally Posted by kees-jan
acid_kewpie: You would be right if he asked about (shell-type) globbing, instead of regular expressions.

m4airifle: Yes, there are a whole bunch of extra matches. Try reading up on regular expressions some more, instead of asking homework questions ;-)
He is allowed to ask homework questions.....
This one is right on the edge of what many here find acceptable---he did show some evidence of having actually worked the problem.
The only ones I will consistently refuse to answer are the obvious cut and paste from the instructor's handout......or when the OP never posts after the initial question.
 
Old 12-06-2006, 11:19 AM   #7
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
Originally Posted by matthewg42
Yes, but when you specify patterns on the command line they are not regular expression patters - they are a much simpler shell "glob" patterns where ? means any single character. See the "Pattern Matching" section of the bash manual page for more information.
That's a new one on me!!!! If it was in any of my books, it went right thru the holes in my sieve (which my wife claims are getting larger...)

Suppose I do this:
Code:
 ls |grep "^l"
(list all files beginning with the letter "l")
Am I using regexes or just bash syntax?
 
Old 12-06-2006, 12:27 PM   #8
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 62
Quote:
Originally Posted by pixellany
Suppose I do this:
Code:
 ls |grep "^l"
(list all files beginning with the letter "l")
Am I using regexes or just bash syntax?
The first argument to grep is a regex.

I think some explanation of how and when the shell expands patterns is in order.

One of the big misconceptions people have is that when you do
Code:
ls *.txt
, that ls is run, and gets passed the parameter *.txt. This is incorrect. You hit return, the shell interprets the command, first expanding the pattern *.txt to the list of files which match the pattern, then then passing the list to ls as arguments. So if you have two files which match the pattern in your current workings directory, one.txt and two.txt, ls will get two arguments, one.txt and two.txt.

You can prevent the shell from pre-expanding the pattern by quoting it or escaping the * with a backslash character, \. For example, both of these commands will not expand the pattern - they will pass the string *.txt to the ls command:
Code:
ls \*.txt
ls '*.txt'
In this case, unless you actually have a file called *.txt (unlikely since * is a special character, but not impossible), ls will complain:
Code:
ls: *.txt: No such file or directory
Important note: There is one more time the program will see the literal string *.txt - if there are no files which match the glob pattern, the shell will pass the original pattern.

Note that the shell has no way to know what a program expects as arguments. For programs which want the pattern, and not the pre-expanded list, you must quote the glob characters to prevent the shell from pre-expanding them.

The command grep takes a regular expression pattern as it's first non-option argument. Regular expressions use some characters which are also used in shell glob patterns, so if you use them, you need to quote or escape those characters to prevent the shell trying to interpret them.

For example, if we are in a directory with the files antelopes.txt apples.txt ardvaarks.txt in it, and you issue the command:
Code:
ls |grep a*
...the behaviour will be as follows:
  1. The shell will expand the glob pattern a*to the list of files which match this pattern: antelopes.txt apples.txt ardvaarks.txt
  2. The shell will start a grep process with this expanded pattern list as arguments: grep antelopes.txt apples.txt ardvaarks.txt
  3. The shell will run ls, and pass the output to the input of the grep process.
If you don't know that the shell is pre-expanding the pattern before passing the arguments to grep you might think it should show a list of the three files. Not so. grep's behaviour is to process standard input only if no files specified on the command line, but it gets three arguments: apples.txt ardvaarks.txt antelopes.txt. It will treat the first one as a regular expression pattern, and the subsequent ones as files in which to search for that pattern.

What's particularly confusing about this is that the behaviour is different depening on which directory you are in. If you were in a directory containing the files bananas batter balloon, you would see all three files as the output of the command:
Code:
ls |grep a*
This is because there are no filenames which match the glob pattern a*, so the literal string a* is passed as the only argument to grep, which treats it as a regular expression which means "match any lines containing 0 or more a characters", i.e. any line at all.

If all this has left you feeling uncertain, don't worry too much about it for now. You will need to understand this to use the shell properly - especially if you're going to write shell scripts which might be used for something important - but there is a practise you can adopt to avoid these problems: quote patterns for programs which expact patterns. This means grep, find (with the -name option), sed, awk etc.
Code:
ls |grep 'a*'
I hope that helps and wasn't too boring. By the way, if your books don't mention shell patterns, throw them away. The best way to learn is to try, fail, read the man page, fail again, read the man page, fail again, read the man page again, succeed and drink a celebratory cup of tea.

Get used to the format of the manual pages. The bash manual page is full of useful information, although it suffers from being dauntingly large, and it's probably wise to learn from a bunch of tutorials and use the man page as a reference. All smaller utilities though - the man page is the best source for information. The structure of man pages is such that the most important and concise information is at the top, and it gets more detailed as you move down the page. Find out what "man -k" does, and apropos too.

Enjoy
 
Old 12-06-2006, 02:15 PM   #9
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Matthew;

Fabulous explanation!! I will return to the books and see what I missed. Meanwhile, it is puzzling that "?? has a different meaning in shell expansion vs in a Regex. It almost sounds like something that Microsoft would do....
 
Old 12-06-2006, 02:30 PM   #10
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 62
Quote:
Originally Posted by pixellany
Matthew;

Fabulous explanation!! I will return to the books and see what I missed. Meanwhile, it is puzzling that "?? has a different meaning in shell expansion vs in a Regex. It almost sounds like something that Microsoft would do....
Thanks

I'm not sure about it being a good or bad thing. Consistency is nice, but then grep isn't the shell - it's an external command. Regular expressions actually have multiple types - standard, extended, perl-style. There are probably more. I think perl 6 introduces a totally new version with a radically different syntax, though I didn't get round to checking that out yet.

There may well be shells that use regular expressions instead of glob patterns, although I'm unaware of any without searching. On the other side, it would be quite possible to write a new grep (or option in grep) which matched using shell-style glob patterns. I was once told that the name "grep" is a contraction of "get regular expression pattern. Sounds plausible. Maybe a glob pattern extraction tool would be called "fag" - find a glob. Hmm, perhaps not!
 
Old 12-06-2006, 11:34 PM   #11
m4a1rifle
LQ Newbie
 
Registered: Dec 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Very insightful indeed. Thanks alot.
btw, isn't "globbing" is the same with "wildcard patterns"?

so, only "grep" is capable of using reg exp while command like "ls" will see it as wildcard patterns even if i quote it?

Last edited by m4a1rifle; 12-06-2006 at 11:43 PM.
 
Old 12-06-2006, 11:59 PM   #12
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 62
Quote:
Originally Posted by m4a1rifle
Very insightful indeed. Thanks alot.
btw, isn't "globbing" is the same with "wildcard patterns"?
Yeah, I think the term "globbing" is more or less interchangeable "wildcard patterns". Perhaps the wildcard name is from the DOS world, I'm not sure of the term's etymology. In *nix OSes, these patterns are implemented with the glob() C library functions. I'm not sure if the name of the function preceded the term's use in other contexts or not.

Quote:
Originally Posted by m4a1rifle
so, only "grep" is capable of using reg exp while command like "ls" will see it as wildcard patterns even if i quote it?
grep only knows about regular expressions. There are a few variants of grep. fgrep doesn't even know regular expressions - it searches only for fixed strings. egrep knows about the extended regular expression syntax. regular grep knows about the normal regular expressions.

If you quote a glob pattern, ls will just see a literal string. ls doesn't know how to expand anything - it just lists the files named by the list of argument it gets, or everything in the current working directory if it doesn't get any arguments. It's the shell which does the expanding of the patterns before it passes the arguments to whatever program is being invokoed.

For example, the echo command is utterly simple. All it does it take the list of arguments passed on the command line, and prints them. For example:
Code:
echo hello world
here echo gets two arguments, "hello" and "world", which is prints to the terminal, and you see the output:
Code:
hello world
If you pass it a pattern, the shell expands the list and echo sees only the names of the files as arguments - it never sees the pattern, and it wouldn't know what to do with it if it did see it:
Code:
echo *
...will print the list of [non-hidden] files in the current working directory. The only need for ls is that it is smarter than echo - it'll format the list in columns, and can show more information than just the file names (size etc), given the proper options. echo just prints the strings it is passed on the command line. Nothing more.

This is the conceptual hurdle - the shell expands the patterns and passes the expanded list of file names to the program which is being invoked. Many people are confused by this because it's the opposite way round to how DOS programs work - in DOS the programs see the pattern, and then are all expected to know how to expand it. Of course some of them do it slightly differently to others, which is one of the reasons why DOS was such a headache to work with.
 
Old 12-07-2006, 06:16 AM   #13
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 718

Rep: Reputation: 72
Hi.

Good explanations matthewg42. I also like the term meta-characters for the special characters. One could consider the knowledge of regular expressions and filename expressions to be the diploma for graduation from beginning to intermediate *nix user.

Quote:
Originally Posted by matthewg42
I was once told that the name "grep" is a contraction of "get regular expression pattern.
In the early days of UNIX the editor ed was often used to look at files. The use of regular expressions helped cut down on the output -- CRTs were not used, one used hard-copy, slow-printing TTYs -- so anything that helped to limit output was important.

One of the basic commands in ed is "p" -- print -- vi has the vestiges of ed in that the "ex" commands, including "p" are still available, e.g. 2,7p to display lines 2 through 7. That format is used in sed and other places. You can tell ed to look over all the text by using "g" -- "global". So to display all of the lines matching a pattern -- a regular expression -- became an often-used sequence: "globally look for a regular expression and print the matches", g/re/p.

To make life easier, a new non-interactive command was written -- what to call it? -- g/re/p -> grep.

I've always liked that story whether it's true or not ... cheers, makyo
 
Old 12-07-2006, 06:18 PM   #14
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,225

Rep: Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021
Here's a useful site:
http://www.regular-expressions.info/
As mentioned/impled above many tools eg sed, grep bash, Perl, PHP etc use different internal regex engines, so knowing what each meta-char means is tool dependent (although there are a lot of common ones eg most take '*' to mean zero or more).
The definitive regex engine comparisons book is: http://www.amazon.co.uk/Mastering-Re...e=UTF8&s=books
 
Old 12-08-2006, 08:28 AM   #15
m4a1rifle
LQ Newbie
 
Registered: Dec 2006
Posts: 5

Original Poster
Rep: Reputation: 0
deleted post

Last edited by m4a1rifle; 12-08-2006 at 09:19 AM.
 
  


Reply

Tags
bash, color, expression, globbing, grep, regular, scripting, shell


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Telling people to use "Google," to "RTFM," or "Use the search feature" Ausar General 77 03-21-2010 11:26 AM
"Xlib: extension "XFree86-DRI" missing on display ":0.0"." zaps Linux - Games 9 05-14-2007 03:07 PM
Switching From Daemon to "Regular Mode" surplusxmas Linux - Newbie 4 07-29-2006 10:05 PM
Any way to get "Alice"; "Call of Duty" series and "Descent 3" to work? JBailey742 Linux - Games 13 06-23-2006 01:34 PM
"Out of range" Error for regular users but not root geekychic Linux - Hardware 2 04-01-2005 09:25 AM


All times are GMT -5. The time now is 02:15 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration