matching only the first string within quotes with grep
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
matching only the first string within quotes with grep
I'm reading Sed and Awk Second Edition (1997) where I came across this:
the file "sampleLine" contains:
Code:
.Se "Appendix" "Full Program Listings"
Here’s a different regular expression that matches the shortest possible extent between two quotation marks:
"[ˆ"]*"
It matches “a quote followed by any number of characters that do not match a quote followed by a quote”:
Code:
$ gres ’"[ˆ"]*"’ ’00’ sampleLine
.Se 00 "Full Program Listings"
(For those who don't know, gres is a primitive version of sed; it simply substitues one string with another. The author uses it in order to highlight the match, which can be done with grep --color=auto).
So I tried it myself:
Code:
grep '"[^"]*' sampleLine
.Se "Appendix" "Full Program Listings"
So the question is, why does it highlight both strings within the quotes? And how can I highlight only the first string in the quotes for any line?
You're right in that I didn't post the correct sentence, but it still doesn't work using grep '"[^"]*"' sampleLine. It still matches both quoted strings.
grep prints the line if there is a partial match.
For efficiency I expect it to print the line as soon as possible and not try further matches. Showing all matches looks like overhead to me...maybe it does it only if --color is given?
The man pages say -m leaves the file after the first matched line. It does not explicitly say after the first match in that line.
I guess you have to try how the --color works...it is not clearly documented.
--color highlights all matches, not only the first on the line. So it repeats it on the same line until there are no matches.
So, indeed, the difference consists in the tool that you're using. Obviously, sed (in that case gres works like sed - in the book there's a small script which is a primitive sed, basically) will only match the first occurrence if you don't use the global flag.
So I thought there was a way of matching only the first occurrence through the regular expressions themselves, but I see now that it depends on the tool you're using.
My bad there, I often forget that -m is for the line not the match
In this particular instance you could get a single match but would need to know something of the matching string, ie. if you said it start with quotes and capital 'a', then you would only get the first string:
Code:
$ echo '.Se "Appendix" aaaa "Full Program Listings" bbbb' | grep '"A[^"]*"'
.Se "Appendix" aaaa "Full Program Listings" bbbb
Thank you for your idea, yes. On the other hand, the initial issue was somehow getting the first match of whole the expression, and I guess that can be done, as I've already said, with sed. But I've no idea how you could do that if you wanted to extract specifically the second or the third match. I'm sure there's a way, right? Can sed do that? I bet awk can.
Well, yes, but I don't think it works for a file whose lines might contain ten matches or none at all, does it?
I think this is something like a "limitation": using grep without colors will print the line - and that is fine, using colors grep will print all the occurrences, because it has no idea which one do you really need. There is no way to easily select the second or fifth match.
You can use grep -P (that is PCRE, was mentioned by syg00) and you can (try to) construct a regexp which will only valid for the given match. But that is - I would say - advanced usage of PCRE.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.