grep ' book.* ' file
grep ' book.* ' file.txt (red is the match):
Code:
books book great books book So, my question is, how come the word that follows the 'book' string is also matched? Shouldn't * match none or however many characters that precede it, i.e. the period, which is either nothing (if you have simply 'book') or, let's say, 'q' (like in the third case). I'm not sure how * works in this case. Does it mean that it can be followed by anything? Then where should the match stop? |
Quote:
Quote:
|
Why isn't the last space not included, given that there is a space after book.*?
|
Quote:
|
".*" could also mean nothing, could it not? i.e. could mean a space (which is not included in ".*"). So in this line "A simple book without punctuation", why is " without" also included (i.e. space + without)? And I suppose the space after "without" is also included, isn't it?
|
Code:
. means any character Code:
.* means any number of any characters Code:
book.* means book(any number of any characters up to the newline, and yes, space is a character) A wonderful resource for testing out and learning by doing is https://regexone.com/ |
So you said "the ".*" will include everything up to (but not including) the last space character in the line."
But then you say that the space is included in the match. And I asked you about that last space character in the line. So that's why I feel that your explanation only partially cleared things for me. If there's a space after .*, then that space is going to be matched, isnt' it? Will that be the last space character in the line? |
Lets pretend that '_' character is space so we can see what i mean:
Quote:
'sentence' grep 'sentence.' will match the word sentence AND one additional character (the space) 'sentence_' grep 'sentence.*' will match the word sentence AND any number of characters up to the new line 'sentence_that_is_long_with_spaces_at_the_end________' |
Well, that's exactly it. You didn't illustrate the difference between " book.* " and "book.* ". I mean, I can understand that ".*" matches up to the end of the line, that's not the problem. But I think it is trickier to understand the space after "book.*"
In your example, grep '*sentence.*' is going to be equivalent to grep 'sentence.* ', is it not?:) Or, to be more accurate, the latter is only going to match everything up only to the first space after the word "end" - which, of course, we don't see. So for instance: grep ' book.* ' file.txt A book with a space at the end of the line_ A book without a space at the end of the_line Whereas grep ' book.*' file.txt highlights everything in both cases. (which is by now clear) My conclusion is that " book.* " stops at the last space of the line, but it also includes it. |
What I find frustrating is that the expression doesn't stop at the first space (and including it). That's how I'd have seen it and that's why I feel it's rather unintuitive.
|
Code:
.* = goobles up everything from here here untile end of line (or multiline). |
Also, you can escape spaces:
grep 'sentence\ ' will grep only instances where sentence has a space after it. |
For grep, the space character has no special significance. It's just another character. Perhaps it's easier to think about the character "x" instead of space. The expression
Code:
'book.*x' If you want to match anything except a space, you have to write the expression that way: Code:
grep 'book[^ ]*' |
Yes, indeed, it's clear. Thanks for summing it up :)
|
I see you are trying to understand how regexp works in general. There are a lot of resources on the net to test/check/try/practice, but actually I would like to suggest you a few:
http://www.regexpal.com/ (you can use your mouse for explanation) http://www.regexr.com/ http://www.myregexp.com/ |
All times are GMT -5. The time now is 11:24 PM. |