LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   matching only the first string within quotes with grep (https://www.linuxquestions.org/questions/linux-newbie-8/matching-only-the-first-string-within-quotes-with-grep-4175595693/)

vincix 12-18-2016 01:26 PM

matching only the first string within quotes with grep
 
I'm reading Sed and Awk Second Edition (1997) where I came across this:
the file "sampleLine" contains:
Code:

.Se "Appendix" "Full Program Listings"

Here’s a different regular expression that matches the shortest possible extent between two quotation marks:
"[ˆ"]*"

It matches “a quote followed by any number of characters that do not match a quote followed by a quote”:
Code:

$ gres ’"[ˆ"]*"’ ’00’ sampleLine
.Se 00 "Full Program Listings"

(For those who don't know, gres is a primitive version of sed; it simply substitues one string with another. The author uses it in order to highlight the match, which can be done with grep --color=auto).

So I tried it myself:
Code:

grep '"[^"]*' sampleLine
.Se "Appendix" "Full Program Listings"

So the question is, why does it highlight both strings within the quotes? And how can I highlight only the first string in the quotes for any line?

Thanks

MadeInGermany 12-18-2016 01:45 PM

Rub your eyes and look again. You missed a character.

vincix 12-18-2016 01:50 PM

You're right in that I didn't post the correct sentence, but it still doesn't work using grep '"[^"]*"' sampleLine. It still matches both quoted strings.

grail 12-18-2016 02:07 PM

Again you have missed the point that it is matching what you have asked for. Try adding some additional characters and it becomes clearer:
Code:

$ echo '.Se "Appendix" aaaa "Full Program Listings" bbbb' | grep '"[^"]*"'
.Se "Appendix" aaaa "Full Program Listings" bbbb

This is because the tool you are using does not stop after finding the first match. For grep you will need to look at the -m switch.

MadeInGermany 12-18-2016 02:57 PM

grep prints the line if there is a partial match.
For efficiency I expect it to print the line as soon as possible and not try further matches. Showing all matches looks like overhead to me...maybe it does it only if --color is given?
The man pages say -m leaves the file after the first matched line. It does not explicitly say after the first match in that line.
I guess you have to try how the --color works...it is not clearly documented.

vincix 12-18-2016 03:04 PM

--color highlights all matches, not only the first on the line. So it repeats it on the same line until there are no matches.
So, indeed, the difference consists in the tool that you're using. Obviously, sed (in that case gres works like sed - in the book there's a small script which is a primitive sed, basically) will only match the first occurrence if you don't use the global flag.

So I thought there was a way of matching only the first occurrence through the regular expressions themselves, but I see now that it depends on the tool you're using.

grail 12-19-2016 12:39 AM

My bad there, I often forget that -m is for the line not the match :(

In this particular instance you could get a single match but would need to know something of the matching string, ie. if you said it start with quotes and capital 'a', then you would only get the first string:
Code:

$ echo '.Se "Appendix" aaaa "Full Program Listings" bbbb' | grep '"A[^"]*"'
.Se "Appendix" aaaa "Full Program Listings" bbbb


vincix 12-19-2016 01:06 AM

Thank you for your idea, yes. On the other hand, the initial issue was somehow getting the first match of whole the expression, and I guess that can be done, as I've already said, with sed. But I've no idea how you could do that if you wanted to extract specifically the second or the third match. I'm sure there's a way, right?:) Can sed do that? I bet awk can.

allend 12-19-2016 01:16 AM

Perhaps use the -o option to grep, then use sed to select the match (below selects the second match).
Code:

echo '.Se "Appendix" aaaa "Full Program Listings" bbbb' | grep -o '"[^"]*"' | sed -n 2p

vincix 12-19-2016 01:20 AM

Well, yes, but I don't think it works for a file whose lines might contain ten matches or none at all, does it?

syg00 12-19-2016 01:37 AM

Use PCRE - much more flexible.

vincix 12-19-2016 03:07 AM

Right, I'm currently struggling with sed/awk. There's a long way to learning perl, if ever.

pan64 12-19-2016 03:22 AM

Quote:

Originally Posted by vincix (Post 5643779)
Well, yes, but I don't think it works for a file whose lines might contain ten matches or none at all, does it?

I think this is something like a "limitation": using grep without colors will print the line - and that is fine, using colors grep will print all the occurrences, because it has no idea which one do you really need. There is no way to easily select the second or fifth match.
You can use grep -P (that is PCRE, was mentioned by syg00) and you can (try to) construct a regexp which will only valid for the given match. But that is - I would say - advanced usage of PCRE.

allend 12-19-2016 06:35 AM

Quote:

Well, yes, but I don't think it works for a file whose lines might contain ten matches or none at all, does it?
True. Are you now convinced you need a new tool?
Quote:

Right, I'm currently struggling with sed/awk. There's a long way to learning perl, if ever.
These are your new tools.

Turbocapitalist 12-19-2016 07:12 AM

Quote:

Originally Posted by allend (Post 5643836)
True. Are you now convinced you need a new tool?
Quote:

Right, I'm currently struggling with sed/awk. There's a long way to learning perl, if ever.
These are your new tools.

Indeed.

grep < sed < awk < perl


All times are GMT -5. The time now is 10:18 PM.