matching only the first string within quotes with grep
I'm reading Sed and Awk Second Edition (1997) where I came across this:
the file "sampleLine" contains: Code:
.Se "Appendix" "Full Program Listings" Here’s a different regular expression that matches the shortest possible extent between two quotation marks: "[ˆ"]*" It matches “a quote followed by any number of characters that do not match a quote followed by a quote”: Code:
$ gres ’"[ˆ"]*"’ ’00’ sampleLine So I tried it myself: Code:
grep '"[^"]*' sampleLine Thanks |
Rub your eyes and look again. You missed a character.
|
You're right in that I didn't post the correct sentence, but it still doesn't work using grep '"[^"]*"' sampleLine. It still matches both quoted strings.
|
Again you have missed the point that it is matching what you have asked for. Try adding some additional characters and it becomes clearer:
Code:
$ echo '.Se "Appendix" aaaa "Full Program Listings" bbbb' | grep '"[^"]*"' |
grep prints the line if there is a partial match.
For efficiency I expect it to print the line as soon as possible and not try further matches. Showing all matches looks like overhead to me...maybe it does it only if --color is given? The man pages say -m leaves the file after the first matched line. It does not explicitly say after the first match in that line. I guess you have to try how the --color works...it is not clearly documented. |
--color highlights all matches, not only the first on the line. So it repeats it on the same line until there are no matches.
So, indeed, the difference consists in the tool that you're using. Obviously, sed (in that case gres works like sed - in the book there's a small script which is a primitive sed, basically) will only match the first occurrence if you don't use the global flag. So I thought there was a way of matching only the first occurrence through the regular expressions themselves, but I see now that it depends on the tool you're using. |
My bad there, I often forget that -m is for the line not the match :(
In this particular instance you could get a single match but would need to know something of the matching string, ie. if you said it start with quotes and capital 'a', then you would only get the first string: Code:
$ echo '.Se "Appendix" aaaa "Full Program Listings" bbbb' | grep '"A[^"]*"' |
Thank you for your idea, yes. On the other hand, the initial issue was somehow getting the first match of whole the expression, and I guess that can be done, as I've already said, with sed. But I've no idea how you could do that if you wanted to extract specifically the second or the third match. I'm sure there's a way, right?:) Can sed do that? I bet awk can.
|
Perhaps use the -o option to grep, then use sed to select the match (below selects the second match).
Code:
echo '.Se "Appendix" aaaa "Full Program Listings" bbbb' | grep -o '"[^"]*"' | sed -n 2p |
Well, yes, but I don't think it works for a file whose lines might contain ten matches or none at all, does it?
|
Use PCRE - much more flexible.
|
Right, I'm currently struggling with sed/awk. There's a long way to learning perl, if ever.
|
Quote:
You can use grep -P (that is PCRE, was mentioned by syg00) and you can (try to) construct a regexp which will only valid for the given match. But that is - I would say - advanced usage of PCRE. |
Quote:
Quote:
|
Quote:
grep < sed < awk < perl |
All times are GMT -5. The time now is 10:18 PM. |