[SOLVED] grep to select lines with M in last word

danielbmartin · 03-06-2011, 08:47 PM

I have a large file in which each line has three or more blank-delimited words. I'd like to code a grep to keep only those lines which have the letter M in the last word. If it's any help, the M (if present) will be the first character in the last word.

quanta · 03-06-2011, 09:53 PM

Something like this:

Code:

awk '{ if ($NF ~ /M/) print $0 }' input

NF: stand for Number of Field.

grail · 03-06-2011, 10:50 PM

Or grep could be:

Code:

egrep 'M[^ ]*$' file

quanta · 03-06-2011, 11:24 PM

Quote:

Originally Posted by grail

Or grep could be:

Code:

egrep 'M[^ ]*$' file

I like your solution. It is better than mine which is conventional thinking.

grail · 03-07-2011, 12:15 AM

Not better .. just different .. I am normally the awk proponent but as you beat me to it I was happy to give an alternative

Telengard · 03-07-2011, 02:16 AM

Quote:

Originally Posted by grail

Code:

egrep 'M[^ ]*$' file

Code:

~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian"
Fable Mabel
Hairy Mary
Mary Martian
~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian" | egrep 'M[^ ]*$'
Fable Mabel
Mary Martian
~$

Note the space at the end of the second line. Is Mary not to be considered a word just because it is followed by an errant space character?

Many human generated text files contain random, unnecessary space characters. They tend to accumulate in places where they go unnoticed, such as adjacent to other whitespace.

It matters even more when processing the output of other programs. For one example, ifconfig likes to add space characters before newline characters.

Code:

~$ ifconfig | hd | grep '20 20 0a'
00000210  6c 20 4c 6f 6f 70 62 61  63 6b 20 20 0a 20 20 20  |l Loopback  .   |
~$

In cases where this matters the awk command is almost certainly to be preferred.

Code:

~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian" | awk '$NF~"M"'
Fable Mabel
Hairy Mary
Mary Martian
~$

danielbmartin · 03-07-2011, 06:51 AM

Quote:

Originally Posted by grail

Or grep could be:

Code:

egrep 'M[^ ]*$' file

This works, and I'm dazzled!

Please give a bit of explanation, and then I will mark this puppy as solved.

My newbie reading is this ...
The M is the character which governs selection or rejection.
The [^ ] says "apply this logic to strings starting with blank."
The * says "apply this logic to all such strings in each line."
The $ says "the last string in each line is the only important one."

Please revise this narrative to make it more correct and instructive.

Thank you!

David the H. · 03-07-2011, 07:35 AM

Quote:

Originally Posted by danielbmartin

Code:

egrep 'M[^ ]*$' file

My newbie reading is this ...
The M is the character which governs selection or rejection.
The [^ ] says "apply this logic to strings starting with blank."
The * says "apply this logic to all such strings in each line."
The $ says "the last string in each line is the only important one."

Not quite. * in regex means "zero or more of the previous character". And in this case, the previous character is [^ ], "not a space". So in layman's English, it could be read as "M, followed by any number of non-space characters, followed by a newline".

As pointed out, it would not match if there happen to be any spaces between the last word and the end of the line.

To catch that, you need to make a small modification.

Code:

egrep 'M[^[:space:]]*[[:space:]]*$'

So this would read as "M, followed by zero or more non-space characters, followed by zero or more spaces, followed by a newline"

I also replaced the simple space with the [:space:] character class here, meaning any kind of whitespace, so tabs would be matched in addition to regular spaces, although it's likely not necessary for your situation.

danielbmartin · 03-07-2011, 10:37 AM

Quote:

Originally Posted by David the H.

Not quite. * in regex means "zero or more of the previous character". And in this case, the previous character is [^ ], "not a space". So in layman's English, it could be read as "M, followed by any number of non-space characters, followed by a newline".

As pointed out, it would not match if there happen to be any spaces between the last word and the end of the line.

To catch that, you need to make a small modification.

Code:

egrep 'M[^[:space:]]*[[:space:]]*$'

So this would read as "M, followed by zero or more non-space characters, followed by zero or more spaces, followed by a newline"

I also replaced the simple space with the [:space:] character class here, meaning any kind of whitespace, so tabs would be matched in addition to regular spaces, although it's likely not necessary for your situation.

Thank you for this clear explanation. This question is SOLVED!

Telengard · 03-07-2011, 11:57 AM

Quote:

Originally Posted by David the H.

Code:

egrep 'M[^[:space:]]*[[:space:]]*$'

awk already considers tabs and normal space characters to be whitespace. My awk command is only 13 characters, while your egrep weighs in at a whopping 35 characters. How is your egrep different or better than this?

Code:

awk '$NF~"M"'

Dark_Helmet · 03-07-2011, 02:20 PM

My mother always told me not to stick my nose where it doesn't belong, but I don't listen to my mother very often.

Quote:

Originally Posted by Telengard

awk already considers tabs and normal space characters to be whitespace. My awk command is only 13 characters, while your egrep weighs in at a whopping 35 characters.

First, congratulations. Though, I must admit I missed the memo that Jeremy sent out that turned the forums into a competition.

Second, it wasn't David the H's command originally, but a follow-on to grail's.

Quote:

Originally Posted by Telengard

How is your egrep different or better than this?

How is it different? Well, grep is not awk. Therefore the commands are different.

How is it better? To quote the original post:

Quote:

Originally Posted by danielbmartin

I'd like to code a grep to keep only those lines which have the letter M in the last word.

So, a grep solution is better than an awk solution because the OP wanted grep. The OP did not ask for awk. The OP did not ask for an open-ended solution. The OP asked for grep.

So I assume the next time you ask someone to pass the salt you'll be happy when they give you pepper.

Relax... seriously. Don't get so defensive.

szboardstretcher · 03-07-2011, 02:26 PM

Code:

grep 'M[^ ]*$' filename

Quote:

$ (Question) = match expression at the end of a line, as in A$.
[^ ] = match any one character except those enclosed in [ ], as in [^0-9].
* (Asterisk) = match zero or more of the preceding character or expression.

Telengard · 03-07-2011, 03:43 PM

Quote:

Originally Posted by Dark_Helmet

So, a grep solution is better than an awk solution because the OP wanted grep.

That's the only thing you said that makes sense to me. Thanks for pointing it out though. It is a valid reason to choose grep.

As for the rest of your message, it seems to be an unwarranted attempt to inject personal animosity into an otherwise friendly discussion. I see that you are ranked senior member, so I'm guessing you didn't get to be one <moderated>. I'll just say your message reads like a personal attack, although I don't claim it is.

Quote:

Relax... seriously. Don't get so defensive.

What are you even referring to? Seriously, I don't get it. I'm rereading my message right now, and I don't see how it comes off as defensive at all.

Anyway, this thread isn't a good place for you and I to make friends. Feel free to PM me if you don't want to change the topic of the OP.

colucix · 03-07-2011, 04:19 PM

This thread is going a bit off-topic! Please, keep discussion fair and reasonable. No need to be pedantic, disrespectful or - even worse - offensive towards other members. The OP already gained proper answers and hopefully learned something useful about regular expressions. Nuff' said!

Dark_Helmet · 03-07-2011, 04:38 PM

I'll unsubscribe from this thread immediately after this response. It's my hope that this won't be taken as offensive in any way--merely an explanation for my original response.

The original response from Telengard that I quoted (#10) accused the egrep command from David the H. (which was a modified version of grail's original command that addressed Telengard's comments about end-of-line spaces) as being worse than Telengard's own awk command--and did so by saying the egrep used a "whopping" number of characters. It's clear that "whopping" was not used in a complimentary way.

In my term here at LQ, I've participated in a number of threads. I have offered a number of solutions to problems. I cannot recall any instance where I accused another member's proposed solution as being worse or inferior to my own. I have pointed out technical problems with solutions posted, but I have never complained when a subsequent version of the command is posted to address those concerns. I simply, quietly let the OP decide which solution they want to use.

To me, such a complaint is an attack on the proposed solution in an effort to defend some other solution as "better." To me, that is unwarranted defensive behavior. We're all here to contribute and it's not about the "best" solution or the most "efficient" unless that's what the OP asks for.