Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
books book great books book
bookssss with space
bookq too many spaces
A simple book without punctuation
(keep this for consistency: there are about 11 spaces between 'bookq' and 'too many' - I don't understand why it doesn't show them)
So, my question is, how come the word that follows the 'book' string is also matched? Shouldn't * match none or however many characters that precede it, i.e. the period, which is either nothing (if you have simply 'book') or, let's say, 'q' (like in the third case).
I'm not sure how * works in this case. Does it mean that it can be followed by anything? Then where should the match stop?
books book great books book
bookssss with space
bookq too many spaces
A simple book without punctuation
(there are about 11 spaces between 'bookq' and 'too many' - I don't understand why it doesn't show them)
Where formatting is important, wrap your text in [CODE] ... [/CODE] tags. The "#" icon in the tools will do that.
Quote:
So, my question is, how come the word that follows the 'book' string is also matched? Shouldn't * match none or however many characters that precede it, i.e. the period, which is either nothing (if you have simply 'book') or, let's say, 'q' (like in the third case).
A "." in a regex matches any character, and ".*" matches any number of any characters. The match is greedy, and will match as many characters as it can while still allowing the overall expression to match. In your case, the only other requirement is a space character, so the ".*" will include everything up to (but not including) the last space character in the line.
Why isn't the last space not included, given that there is a space after book.*?
The space is included in the match. It is matched by the literal space in the expression, not by the ".*". If the ".*" did include that space, then the overall match would fail because there would be nothing to match the literal space at the end of the expression.
".*" could also mean nothing, could it not? i.e. could mean a space (which is not included in ".*"). So in this line "A simple book without punctuation", why is " without" also included (i.e. space + without)? And I suppose the space after "without" is also included, isn't it?
So you said "the ".*" will include everything up to (but not including) the last space character in the line."
But then you say that the space is included in the match. And I asked you about that last space character in the line. So that's why I feel that your explanation only partially cleared things for me.
If there's a space after .*, then that space is going to be matched, isnt' it? Will that be the last space character in the line?
Well, that's exactly it. You didn't illustrate the difference between " book.* " and "book.* ". I mean, I can understand that ".*" matches up to the end of the line, that's not the problem. But I think it is trickier to understand the space after "book.*"
In your example, grep '*sentence.*' is going to be equivalent to grep 'sentence.* ', is it not? Or, to be more accurate, the latter is only going to match everything up only to the first space after the word "end" - which, of course, we don't see.
So for instance:
grep ' book.* ' file.txt
A book with a space at the end of the line_
A book without a space at the end of the_line
Whereas grep ' book.*' file.txt highlights everything in both cases. (which is by now clear)
My conclusion is that " book.* " stops at the last space of the line, but it also includes it.
What I find frustrating is that the expression doesn't stop at the first space (and including it). That's how I'd have seen it and that's why I feel it's rather unintuitive.
.* = goobles up everything from here here untile end of line (or multiline).
A* = gobbles up all 'A's from here until the end of line (or multiline). If none found, stop.
'test *' = match the word 'test' optionally followed by space(s). Stop when no more spaces can be gobbled up.
Just like they said, space is included in the match.
For grep, the space character has no special significance. It's just another character. Perhaps it's easier to think about the character "x" instead of space. The expression
Code:
'book.*x'
will match the string "book" and any number of subsequent characters up to the last occurrence of "x" in the line. The ".*" will match everything up to but not including that final "x", and then the "x" in the expression matches itself. (And if there is no "x", the match will fail.)
If you want to match anything except a space, you have to write the expression that way:
Code:
grep 'book[^ ]*'
That will match "book", "book.", "books", "bookkeeper", "bookend", "book_index[s]->pagenum", etc., but will not include a space character or anything that follows.
I see you are trying to understand how regexp works in general. There are a lot of resources on the net to test/check/try/practice, but actually I would like to suggest you a few: http://www.regexpal.com/ (you can use your mouse for explanation) http://www.regexr.com/ http://www.myregexp.com/
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.