LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-13-2016, 09:39 AM   #1
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,240

Rep: Reputation: 103Reputation: 103
grep ' book.* ' file


grep ' book.* ' file.txt (red is the match):

Code:
books book great books book
 bookssss with space
 bookq           too many spaces
A simple book without punctuation
(keep this for consistency: there are about 11 spaces between 'bookq' and 'too many' - I don't understand why it doesn't show them)

So, my question is, how come the word that follows the 'book' string is also matched? Shouldn't * match none or however many characters that precede it, i.e. the period, which is either nothing (if you have simply 'book') or, let's say, 'q' (like in the third case).

I'm not sure how * works in this case. Does it mean that it can be followed by anything? Then where should the match stop?

Last edited by vincix; 12-13-2016 at 09:51 AM.
 
Old 12-13-2016, 09:48 AM   #2
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,786

Rep: Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216
Quote:
Originally Posted by vincix View Post
grep ' book.* ' file.txt (red is the match):

books book great books book
bookssss with space
bookq too many spaces
A simple book without punctuation

(there are about 11 spaces between 'bookq' and 'too many' - I don't understand why it doesn't show them)
Where formatting is important, wrap your text in [CODE] ... [/CODE] tags. The "#" icon in the tools will do that.
Quote:
So, my question is, how come the word that follows the 'book' string is also matched? Shouldn't * match none or however many characters that precede it, i.e. the period, which is either nothing (if you have simply 'book') or, let's say, 'q' (like in the third case).
A "." in a regex matches any character, and ".*" matches any number of any characters. The match is greedy, and will match as many characters as it can while still allowing the overall expression to match. In your case, the only other requirement is a space character, so the ".*" will include everything up to (but not including) the last space character in the line.
 
1 members found this post helpful.
Old 12-13-2016, 09:54 AM   #3
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,240

Original Poster
Rep: Reputation: 103Reputation: 103
Why isn't the last space not included, given that there is a space after book.*?
 
Old 12-13-2016, 10:02 AM   #4
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,786

Rep: Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216
Quote:
Originally Posted by vincix View Post
Why isn't the last space not included, given that there is a space after book.*?
The space is included in the match. It is matched by the literal space in the expression, not by the ".*". If the ".*" did include that space, then the overall match would fail because there would be nothing to match the literal space at the end of the expression.
 
Old 12-13-2016, 12:35 PM   #5
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,240

Original Poster
Rep: Reputation: 103Reputation: 103
".*" could also mean nothing, could it not? i.e. could mean a space (which is not included in ".*"). So in this line "A simple book without punctuation", why is " without" also included (i.e. space + without)? And I suppose the space after "without" is also included, isn't it?
 
Old 12-13-2016, 12:45 PM   #6
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Code:
. means any character
* means any number of character
so
Code:
.* means any number of any characters
and

Code:
book.* means book(any number of any characters up to the newline, and yes, space is a character)
replies 2 and 4 go over this. If that is not sufficient, could you explain further your question?

A wonderful resource for testing out and learning by doing is https://regexone.com/

Last edited by szboardstretcher; 12-13-2016 at 12:50 PM.
 
1 members found this post helpful.
Old 12-13-2016, 12:54 PM   #7
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,240

Original Poster
Rep: Reputation: 103Reputation: 103
So you said "the ".*" will include everything up to (but not including) the last space character in the line."
But then you say that the space is included in the match. And I asked you about that last space character in the line. So that's why I feel that your explanation only partially cleared things for me.

If there's a space after .*, then that space is going to be matched, isnt' it? Will that be the last space character in the line?
 
Old 12-13-2016, 12:57 PM   #8
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Lets pretend that '_' character is space so we can see what i mean:

Quote:
this_is_a_sentence_that_is_long_with_spaces_at_the_end________
grep 'sentence' will match ONLY the word sentence
'sentence'

grep 'sentence.' will match the word sentence AND one additional character (the space)
'sentence_'

grep 'sentence.*' will match the word sentence AND any number of characters up to the new line
'sentence_that_is_long_with_spaces_at_the_end________'

Last edited by szboardstretcher; 12-13-2016 at 01:00 PM. Reason: make more readable
 
Old 12-13-2016, 01:13 PM   #9
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,240

Original Poster
Rep: Reputation: 103Reputation: 103
Well, that's exactly it. You didn't illustrate the difference between " book.* " and "book.* ". I mean, I can understand that ".*" matches up to the end of the line, that's not the problem. But I think it is trickier to understand the space after "book.*"

In your example, grep '*sentence.*' is going to be equivalent to grep 'sentence.* ', is it not? Or, to be more accurate, the latter is only going to match everything up only to the first space after the word "end" - which, of course, we don't see.

So for instance:
grep ' book.* ' file.txt
A book with a space at the end of the line_
A book without a space at the end of the_line


Whereas grep ' book.*' file.txt highlights everything in both cases. (which is by now clear)

My conclusion is that " book.* " stops at the last space of the line, but it also includes it.

Last edited by vincix; 12-13-2016 at 01:15 PM.
 
Old 12-13-2016, 01:39 PM   #10
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,240

Original Poster
Rep: Reputation: 103Reputation: 103
What I find frustrating is that the expression doesn't stop at the first space (and including it). That's how I'd have seen it and that's why I feel it's rather unintuitive.
 
Old 12-13-2016, 01:50 PM   #11
c0wb0y
Member
 
Registered: Jan 2012
Location: Inside the oven
Distribution: Windows
Posts: 421

Rep: Reputation: 74
Code:
.* = goobles up everything from here here untile end of line (or multiline).
A* = gobbles up all 'A's from here until the end of line (or multiline). If none found, stop.
'test *' = match the word 'test' optionally followed by space(s). Stop when no more spaces can be gobbled up.
Just like they said, space is included in the match.
 
Old 12-13-2016, 01:56 PM   #12
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Also, you can escape spaces:

grep 'sentence\ ' will grep only instances where sentence has a space after it.
 
Old 12-13-2016, 04:33 PM   #13
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,786

Rep: Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216Reputation: 2216
For grep, the space character has no special significance. It's just another character. Perhaps it's easier to think about the character "x" instead of space. The expression
Code:
'book.*x'
will match the string "book" and any number of subsequent characters up to the last occurrence of "x" in the line. The ".*" will match everything up to but not including that final "x", and then the "x" in the expression matches itself. (And if there is no "x", the match will fail.)

If you want to match anything except a space, you have to write the expression that way:
Code:
grep 'book[^ ]*'
That will match "book", "book.", "books", "bookkeeper", "bookend", "book_index[s]->pagenum", etc., but will not include a space character or anything that follows.
 
1 members found this post helpful.
Old 12-14-2016, 01:31 AM   #14
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,240

Original Poster
Rep: Reputation: 103Reputation: 103
Yes, indeed, it's clear. Thanks for summing it up
 
Old 12-14-2016, 01:57 AM   #15
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,132

Rep: Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375Reputation: 7375
I see you are trying to understand how regexp works in general. There are a lot of resources on the net to test/check/try/practice, but actually I would like to suggest you a few:
http://www.regexpal.com/ (you can use your mouse for explanation)
http://www.regexr.com/
http://www.myregexp.com/
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep to file outputs more than grep to screen? tcpman Linux - Server 4 06-07-2013 04:46 AM
shell script to grep using one file and create files using another file samanp Programming 4 08-22-2012 09:27 PM
Creating an alias in ksh that uses grep and includes 'grep -v grep' doug248 Linux - Newbie 2 08-05-2012 02:07 PM
grep output on stdout and grep output to file don't match xnomad Linux - General 3 01-13-2007 04:56 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration