LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
LinkBack Search this Thread
Old 03-06-2011, 08:47 PM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,035

Rep: Reputation: 276Reputation: 276Reputation: 276
grep to select lines with M in last word


I have a large file in which each line has three or more blank-delimited words. I'd like to code a grep to keep only those lines which have the letter M in the last word. If it's any help, the M (if present) will be the first character in the last word.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 03-06-2011, 09:53 PM   #2
quanta
Member
 
Registered: Aug 2007
Location: Vietnam
Distribution: RedHat based, Debian based, Slackware, Gentoo
Posts: 724

Rep: Reputation: 100Reputation: 100
Something like this:
Code:
awk '{ if ($NF ~ /M/) print $0 }' input
NF: stand for Number of Field.
 
2 members found this post helpful.
Old 03-06-2011, 10:50 PM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,202

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
Or grep could be:
Code:
egrep 'M[^ ]*$' file
 
3 members found this post helpful.
Old 03-06-2011, 11:24 PM   #4
quanta
Member
 
Registered: Aug 2007
Location: Vietnam
Distribution: RedHat based, Debian based, Slackware, Gentoo
Posts: 724

Rep: Reputation: 100Reputation: 100
Quote:
Originally Posted by grail View Post
Or grep could be:
Code:
egrep 'M[^ ]*$' file
I like your solution. It is better than mine which is conventional thinking.
 
Old 03-07-2011, 12:15 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,202

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
Not better .. just different .. I am normally the awk proponent but as you beat me to it I was happy to give an alternative
 
Old 03-07-2011, 02:16 AM   #6
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Exclamation There is at least one case where egrep fails.

Quote:
Originally Posted by grail View Post
Code:
egrep 'M[^ ]*$' file
Code:
~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian"
Fable Mabel
Hairy Mary
Mary Martian
~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian" | egrep 'M[^ ]*$'
Fable Mabel
Mary Martian
~$
Note the space at the end of the second line. Is Mary not to be considered a word just because it is followed by an errant space character?

Many human generated text files contain random, unnecessary space characters. They tend to accumulate in places where they go unnoticed, such as adjacent to other whitespace.

It matters even more when processing the output of other programs. For one example, ifconfig likes to add space characters before newline characters.

Code:
~$ ifconfig | hd | grep '20 20 0a'
00000210  6c 20 4c 6f 6f 70 62 61  63 6b 20 20 0a 20 20 20  |l Loopback  .   |
~$
In cases where this matters the awk command is almost certainly to be preferred.

Code:
~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian" | awk '$NF~"M"'
Fable Mabel
Hairy Mary
Mary Martian
~$
 
1 members found this post helpful.
Old 03-07-2011, 06:51 AM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,035

Original Poster
Rep: Reputation: 276Reputation: 276Reputation: 276
Quote:
Originally Posted by grail View Post
Or grep could be:
Code:
egrep 'M[^ ]*$' file
This works, and I'm dazzled!

Please give a bit of explanation, and then I will mark this puppy as solved.

My newbie reading is this ...
The M is the character which governs selection or rejection.
The [^ ] says "apply this logic to strings starting with blank."
The * says "apply this logic to all such strings in each line."
The $ says "the last string in each line is the only important one."

Please revise this narrative to make it more correct and instructive.

Thank you!
 
Old 03-07-2011, 07:35 AM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Quote:
Originally Posted by danielbmartin View Post
Code:
egrep 'M[^ ]*$' file
My newbie reading is this ...
The M is the character which governs selection or rejection.
The [^ ] says "apply this logic to strings starting with blank."
The * says "apply this logic to all such strings in each line."
The $ says "the last string in each line is the only important one."
Not quite. * in regex means "zero or more of the previous character". And in this case, the previous character is [^ ], "not a space". So in layman's English, it could be read as "M, followed by any number of non-space characters, followed by a newline".

As pointed out, it would not match if there happen to be any spaces between the last word and the end of the line.

To catch that, you need to make a small modification.
Code:
egrep 'M[^[:space:]]*[[:space:]]*$'
So this would read as "M, followed by zero or more non-space characters, followed by zero or more spaces, followed by a newline"

I also replaced the simple space with the [:space:] character class here, meaning any kind of whitespace, so tabs would be matched in addition to regular spaces, although it's likely not necessary for your situation.
 
2 members found this post helpful.
Old 03-07-2011, 10:37 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,035

Original Poster
Rep: Reputation: 276Reputation: 276Reputation: 276
Quote:
Originally Posted by David the H. View Post
Not quite. * in regex means "zero or more of the previous character". And in this case, the previous character is [^ ], "not a space". So in layman's English, it could be read as "M, followed by any number of non-space characters, followed by a newline".

As pointed out, it would not match if there happen to be any spaces between the last word and the end of the line.

To catch that, you need to make a small modification.
Code:
egrep 'M[^[:space:]]*[[:space:]]*$'
So this would read as "M, followed by zero or more non-space characters, followed by zero or more spaces, followed by a newline"

I also replaced the simple space with the [:space:] character class here, meaning any kind of whitespace, so tabs would be matched in addition to regular spaces, although it's likely not necessary for your situation.
Thank you for this clear explanation. This question is SOLVED!
 
Old 03-07-2011, 11:57 AM   #10
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Question

Quote:
Originally Posted by David the H. View Post
Code:
egrep 'M[^[:space:]]*[[:space:]]*$'
awk already considers tabs and normal space characters to be whitespace. My awk command is only 13 characters, while your egrep weighs in at a whopping 35 characters. How is your egrep different or better than this?

Code:
awk '$NF~"M"'
 
Old 03-07-2011, 02:20 PM   #11
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 367Reputation: 367Reputation: 367Reputation: 367
My mother always told me not to stick my nose where it doesn't belong, but I don't listen to my mother very often.

Quote:
Originally Posted by Telengard
awk already considers tabs and normal space characters to be whitespace. My awk command is only 13 characters, while your egrep weighs in at a whopping 35 characters.
First, congratulations. Though, I must admit I missed the memo that Jeremy sent out that turned the forums into a competition.

Second, it wasn't David the H's command originally, but a follow-on to grail's.

Quote:
Originally Posted by Telengard
How is your egrep different or better than this?
How is it different? Well, grep is not awk. Therefore the commands are different.

How is it better? To quote the original post:
Quote:
Originally Posted by danielbmartin
I'd like to code a grep to keep only those lines which have the letter M in the last word.
So, a grep solution is better than an awk solution because the OP wanted grep. The OP did not ask for awk. The OP did not ask for an open-ended solution. The OP asked for grep.

So I assume the next time you ask someone to pass the salt you'll be happy when they give you pepper.

Relax... seriously. Don't get so defensive.
 
1 members found this post helpful.
Old 03-07-2011, 02:26 PM   #12
szboardstretcher
Senior Member
 
Registered: Aug 2006
Distribution: Arch 2014.02.01
Posts: 2,334
Blog Entries: 1

Rep: Reputation: 744Reputation: 744Reputation: 744Reputation: 744Reputation: 744Reputation: 744Reputation: 744
Code:
grep 'M[^ ]*$' filename
Quote:
$ (Question) = match expression at the end of a line, as in A$.
[^ ] = match any one character except those enclosed in [ ], as in [^0-9].
* (Asterisk) = match zero or more of the preceding character or expression.
 
1 members found this post helpful.
Old 03-07-2011, 03:43 PM   #13
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Quote:
Originally Posted by Dark_Helmet View Post
So, a grep solution is better than an awk solution because the OP wanted grep.
That's the only thing you said that makes sense to me. Thanks for pointing it out though. It is a valid reason to choose grep.

As for the rest of your message, it seems to be an unwarranted attempt to inject personal animosity into an otherwise friendly discussion. I see that you are ranked senior member, so I'm guessing you didn't get to be one <moderated>. I'll just say your message reads like a personal attack, although I don't claim it is.

Quote:
Relax... seriously. Don't get so defensive.
What are you even referring to? Seriously, I don't get it. I'm rereading my message right now, and I don't see how it comes off as defensive at all.


Anyway, this thread isn't a good place for you and I to make friends. Feel free to PM me if you don't want to change the topic of the OP.

Last edited by colucix; 03-07-2011 at 04:21 PM. Reason: Removed colorful expression.
 
Old 03-07-2011, 04:19 PM   #14
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,370

Rep: Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911Reputation: 1911
This thread is going a bit off-topic! Please, keep discussion fair and reasonable. No need to be pedantic, disrespectful or - even worse - offensive towards other members. The OP already gained proper answers and hopefully learned something useful about regular expressions. Nuff' said!
 
1 members found this post helpful.
Old 03-07-2011, 04:38 PM   #15
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 367Reputation: 367Reputation: 367Reputation: 367
I'll unsubscribe from this thread immediately after this response. It's my hope that this won't be taken as offensive in any way--merely an explanation for my original response.

The original response from Telengard that I quoted (#10) accused the egrep command from David the H. (which was a modified version of grail's original command that addressed Telengard's comments about end-of-line spaces) as being worse than Telengard's own awk command--and did so by saying the egrep used a "whopping" number of characters. It's clear that "whopping" was not used in a complimentary way.

In my term here at LQ, I've participated in a number of threads. I have offered a number of solutions to problems. I cannot recall any instance where I accused another member's proposed solution as being worse or inferior to my own. I have pointed out technical problems with solutions posted, but I have never complained when a subsequent version of the command is posted to address those concerns. I simply, quietly let the OP decide which solution they want to use.

To me, such a complaint is an attack on the proposed solution in an effort to defend some other solution as "better." To me, that is unwarranted defensive behavior. We're all here to contribute and it's not about the "best" solution or the most "efficient" unless that's what the OP asks for.
 
1 members found this post helpful.
  


Reply

Tags
awk, grep, regex, regular expressions


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
print second word in 1st line along with 5th word in all the lines after the first bangaram Programming 5 08-31-2009 03:42 AM
grep a word from some files??? shahz Red Hat 3 09-06-2008 03:48 AM
select all text between a patteren using grep mauran Programming 22 07-14-2007 06:30 PM
text select Abi Word thegreatgatsby Linux - Software 8 02-20-2004 03:51 AM
grep to NOT select a word robertmarkbram Programming 2 08-21-2003 10:48 PM


All times are GMT -5. The time now is 10:16 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration