LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 09-13-2010, 06:49 PM   #1
trist007
Member
 
Registered: May 2008
Distribution: Slackware
Posts: 974

Rep: Reputation: 56
A question about regex


In a folder with files and directories I execute
Code:
ls -l | grep ^d
To display lines with the first char of that line being a d, which are the directories. However
why is it that when I use
Code:
ls -l | grep ^d*
does it display lines that do NOT include a 'd' as the first char of the line.

Yet this command shows all lines with 'd' being the first char which is not making sense.
Code:
ls -l | grep ^dZ*
The 'Z' is not even on any of the lines. Is it because the asterisk looks for zero or more Z's? Is that the same why 'grep ^d*' grabs zero or more d's in which case any character would return TRUE to the match?
 
Old 09-13-2010, 07:09 PM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
asterix means 0 or more. so ^d* means look for 0 or more leading d's . you should leave out the asterix if you just want to get directory entries.
Code:
ls -l | grep ^d
 
Old 09-14-2010, 02:24 AM   #3
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi,

if you want to avoid the pipe in displaying directories you can alos use
Code:
ls -d */
ls -ld */
If you want to include hidden (.dir) directories then you might have to turn on dotglob first
Code:
shopt -s dotglob
 
Old 09-14-2010, 07:15 AM   #4
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
In rexexes, asterisk means zero or more of the preceding character, not zero or more of any characters, like it does in wildcards.

But then, shouldn't it have matched everything? I'm not sure.

I think that the problem might also be that you didn't quote the regex in single quotes, and bash might be mangling the regex before grep sees it.

Last edited by MTK358; 09-14-2010 at 07:17 AM.
 
Old 09-14-2010, 06:50 PM   #5
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
A little off-topic, but a suggestion that may help in the future:

Whenever you use a regular expression at the command line, I highly recommend enclosing the regular expression in single quotes (').

Why? The asterisk, period, question mark, and other punctuation have different meanings between the shell and regular expression.

The shell can (and will) intercept the characters of your regular expression for filename expansion/globbing--which will mutilate your regular expression before it gets to the program you're launching and can cause very unexpected results.

If you enclose the regular expression in single quotes, the shell will not perform expansion/globbing. The regular expression will be passed to the program unmodified (with the single quotes removed).

It didn't happen in this case because there were no filenames that matched the wildcard. The shell tried to expand the asterisk, but because there were no matches, the shell passed the expression to grep unmodified.

I only post this suggestion because the tension between shell expansion and regular expressions caused me grief for a long, long time.
 
Old 09-14-2010, 07:20 PM   #6
Reisswolf
Member
 
Registered: Jun 2007
Posts: 67

Rep: Reputation: 15
Code:
[aroy@localhost ~]$ ls -l | grep ^d
drwxr-xr-x. 2 aroy aroy 4096 Sep  2 21:36 Desktop
drwxr-xr-x. 3 aroy aroy 4096 Sep 14 19:56 Documents
drwxr-xr-x. 4 aroy aroy 4096 Sep 12 03:18 Downloads
drwxr-xr-x. 2 aroy aroy 4096 Aug 22 13:51 Music
drwxr-xr-x. 5 aroy aroy 4096 Sep 14 18:42 Pictures
drwxrwxr-x. 7 aroy aroy 4096 Aug  4 14:02 Programming
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Public
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Templates
drwxrwxr-x. 2 aroy aroy 4096 Aug  5 03:07 Temporary
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Videos
[aroy@localhost ~]$ ls -l | grep ^d*
total 40
drwxr-xr-x. 2 aroy aroy 4096 Sep  2 21:36 Desktop
drwxr-xr-x. 3 aroy aroy 4096 Sep 14 19:56 Documents
drwxr-xr-x. 4 aroy aroy 4096 Sep 12 03:18 Downloads
drwxr-xr-x. 2 aroy aroy 4096 Aug 22 13:51 Music
drwxr-xr-x. 5 aroy aroy 4096 Sep 14 18:42 Pictures
drwxrwxr-x. 7 aroy aroy 4096 Aug  4 14:02 Programming
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Public
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Templates
drwxrwxr-x. 2 aroy aroy 4096 Aug  5 03:07 Temporary
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Videos
[aroy@localhost ~]$
Hmm, seems like I'm getting a the expected output with the asterisk.
 
Old 09-14-2010, 07:28 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,978
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
Quote:
Originally Posted by Reisswolf View Post
Code:
[aroy@localhost ~]$ ls -l | grep ^d
drwxr-xr-x. 2 aroy aroy 4096 Sep  2 21:36 Desktop
drwxr-xr-x. 3 aroy aroy 4096 Sep 14 19:56 Documents
drwxr-xr-x. 4 aroy aroy 4096 Sep 12 03:18 Downloads
drwxr-xr-x. 2 aroy aroy 4096 Aug 22 13:51 Music
drwxr-xr-x. 5 aroy aroy 4096 Sep 14 18:42 Pictures
drwxrwxr-x. 7 aroy aroy 4096 Aug  4 14:02 Programming
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Public
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Templates
drwxrwxr-x. 2 aroy aroy 4096 Aug  5 03:07 Temporary
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Videos
[aroy@localhost ~]$ ls -l | grep ^d*
total 40
drwxr-xr-x. 2 aroy aroy 4096 Sep  2 21:36 Desktop
drwxr-xr-x. 3 aroy aroy 4096 Sep 14 19:56 Documents
drwxr-xr-x. 4 aroy aroy 4096 Sep 12 03:18 Downloads
drwxr-xr-x. 2 aroy aroy 4096 Aug 22 13:51 Music
drwxr-xr-x. 5 aroy aroy 4096 Sep 14 18:42 Pictures
drwxrwxr-x. 7 aroy aroy 4096 Aug  4 14:02 Programming
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Public
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Templates
drwxrwxr-x. 2 aroy aroy 4096 Aug  5 03:07 Temporary
drwxr-xr-x. 2 aroy aroy 4096 Jul 28 18:49 Videos
[aroy@localhost ~]$
Hmm, seems like I'm getting a the expected output with the asterisk.
And what does a plain
Code:
ls -l
show?
I'll hazard a guess and say "the same".


Cheers,
Tink
 
Old 09-15-2010, 02:46 PM   #8
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Hmm, seems like I'm getting a the expected output with the asterisk.
Actually, you're not. Look closely--specifically the first line of your output in each example. The example with the asterisk has this in its output:

Code:
total 40
whereas the other command doesn't. That output line certainly does not start with a 'd' and shouldn't be displayed if the regular expression is working properly.

EDIT: And by "properly," I mean producing the result that the OP expected (which was a mix-up between how the asterisk works in the shell and how it works in a regular expression)

That line of output gets through grep because the asterisk is being interpreted by grep as "zero or more of the preceding item." That basically turns the regular expression into "match any line that starts with a 'd' or not" -- which will match every line of output.

Last edited by Dark_Helmet; 09-15-2010 at 03:02 PM. Reason: Clarification
 
Old 09-15-2010, 06:24 PM   #9
Reisswolf
Member
 
Registered: Jun 2007
Posts: 67

Rep: Reputation: 15
Just to clarify: the original poster asked:

Quote:
does it display lines that do NOT include a 'd' as the first char of the line.
I took that to mean that in his case, the output excluded lines that began with "d."

That is why I wrote that I was getting the expected output. By that I meant that I was also getting all the lines that began with "d." No mystery here.

I'm reasonably good with regular expressions. I know perfectly well the difference between

$ ls -l | grep ^d*

and

ls -l | grep ^d.*

The second one will return all the lines that begin with "d," but also have at least one character more than "d."

I guess an "also" right before "display" in the quotation above would have made the question clearer to me.
 
Old 09-15-2010, 07:08 PM   #10
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
Again, ALWAYS SINGLE QUOTE REGULAR EXPRESSIONS!
 
0 members found this post helpful.
Old 09-15-2010, 08:14 PM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by MTK358 View Post
Again, ALWAYS SINGLE QUOTE REGULAR EXPRESSIONS!
depends on situation. If the creation of regex is dynamic, double quotes are necessary for interpolation.
 
Old 09-16-2010, 01:30 AM   #12
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,978
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
Quote:
Originally Posted by Reisswolf View Post
Just to clarify: the original poster asked:



I took that to mean that in his case, the output excluded lines that began with "d."

That is why I wrote that I was getting the expected output. By that I meant that I was also getting all the lines that began with "d." No mystery here.

I'm reasonably good with regular expressions. I know perfectly well the difference between

$ ls -l | grep ^d*

and

ls -l | grep ^d.*

The second one will return all the lines that begin with "d," but also have at least one character more than "d."

I guess an "also" right before "display" in the quotation above would have made the question clearer to me.
You didn't actually quote correctly, intentionally or not.
Quote:
However why is it that when I use
Code:
ls -l | grep ^d*
does it display lines that do NOT include a 'd' as the first char of the line.
To me, though not a native speaker, this indicates
that he expects to ONLY see lines beginning with a
"d", which means that he hasn't grasped the regex
syntax (yet) to mean that 0 "d"s matches, in other
words ANY line will.


Cheers,
Tink
 
Old 09-16-2010, 01:20 PM   #13
Reisswolf
Member
 
Registered: Jun 2007
Posts: 67

Rep: Reputation: 15
Quote:
To me, though not a native speaker...
Don't worry! I live in New York, so neither am I!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regex.h question MTK358 Programming 4 06-08-2010 04:10 PM
Awk regex question uncle-c Linux - Newbie 2 03-03-2009 08:30 AM
Regex question once again Isotonik Linux - Newbie 2 06-14-2006 02:15 PM
regex question Toadman Linux - General 0 12-30-2005 12:59 PM
sed / regex question whysyn Linux - General 3 06-28-2005 02:11 PM


All times are GMT -5. The time now is 11:50 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration