Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a large file in which each line has three or more blank-delimited words. I'd like to code a grep to keep only those lines which have the letter M in the last word. If it's any help, the M (if present) will be the first character in the last word.
Click here to see the post LQ members have rated as the most helpful post in this thread.
~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian"
Fable Mabel
Hairy Mary
Mary Martian
~$ echo -e "Fable Mabel\nHairy Mary \nMary Martian" | egrep 'M[^ ]*$'
Fable Mabel
Mary Martian
~$
Note the space at the end of the second line. Is Mary not to be considered a word just because it is followed by an errant space character?
Many human generated text files contain random, unnecessary space characters. They tend to accumulate in places where they go unnoticed, such as adjacent to other whitespace.
It matters even more when processing the output of other programs. For one example, ifconfig likes to add space characters before newline characters.
Please give a bit of explanation, and then I will mark this puppy as solved.
My newbie reading is this ...
The M is the character which governs selection or rejection.
The [^ ] says "apply this logic to strings starting with blank."
The * says "apply this logic to all such strings in each line."
The $ says "the last string in each line is the only important one."
Please revise this narrative to make it more correct and instructive.
My newbie reading is this ...
The M is the character which governs selection or rejection.
The [^ ] says "apply this logic to strings starting with blank."
The * says "apply this logic to all such strings in each line."
The $ says "the last string in each line is the only important one."
Not quite. * in regex means "zero or more of the previous character". And in this case, the previous character is [^ ], "not a space". So in layman's English, it could be read as "M, followed by any number of non-space characters, followed by a newline".
As pointed out, it would not match if there happen to be any spaces between the last word and the end of the line.
To catch that, you need to make a small modification.
Code:
egrep 'M[^[:space:]]*[[:space:]]*$'
So this would read as "M, followed by zero or more non-space characters, followed by zero or more spaces, followed by a newline"
I also replaced the simple space with the [:space:] character class here, meaning any kind of whitespace, so tabs would be matched in addition to regular spaces, although it's likely not necessary for your situation.
Not quite. * in regex means "zero or more of the previous character". And in this case, the previous character is [^ ], "not a space". So in layman's English, it could be read as "M, followed by any number of non-space characters, followed by a newline".
As pointed out, it would not match if there happen to be any spaces between the last word and the end of the line.
To catch that, you need to make a small modification.
Code:
egrep 'M[^[:space:]]*[[:space:]]*$'
So this would read as "M, followed by zero or more non-space characters, followed by zero or more spaces, followed by a newline"
I also replaced the simple space with the [:space:] character class here, meaning any kind of whitespace, so tabs would be matched in addition to regular spaces, although it's likely not necessary for your situation.
Thank you for this clear explanation. This question is SOLVED!
awk already considers tabs and normal space characters to be whitespace. My awk command is only 13 characters, while your egrep weighs in at a whopping 35 characters. How is your egrep different or better than this?
My mother always told me not to stick my nose where it doesn't belong, but I don't listen to my mother very often.
Quote:
Originally Posted by Telengard
awk already considers tabs and normal space characters to be whitespace. My awk command is only 13 characters, while your egrep weighs in at a whopping 35 characters.
First, congratulations. Though, I must admit I missed the memo that Jeremy sent out that turned the forums into a competition.
Second, it wasn't David the H's command originally, but a follow-on to grail's.
Quote:
Originally Posted by Telengard
How is your egrep different or better than this?
How is it different? Well, grep is not awk. Therefore the commands are different.
How is it better? To quote the original post:
Quote:
Originally Posted by danielbmartin
I'd like to code a grep to keep only those lines which have the letter M in the last word.
So, a grep solution is better than an awk solution because the OP wanted grep. The OP did not ask for awk. The OP did not ask for an open-ended solution. The OP asked for grep.
So I assume the next time you ask someone to pass the salt you'll be happy when they give you pepper.
$ (Question) = match expression at the end of a line, as in A$.
[^ ] = match any one character except those enclosed in [ ], as in [^0-9].
* (Asterisk) = match zero or more of the preceding character or expression.
So, a grep solution is better than an awk solution because the OP wanted grep.
That's the only thing you said that makes sense to me. Thanks for pointing it out though. It is a valid reason to choose grep.
As for the rest of your message, it seems to be an unwarranted attempt to inject personal animosity into an otherwise friendly discussion. I see that you are ranked senior member, so I'm guessing you didn't get to be one <moderated>. I'll just say your message reads like a personal attack, although I don't claim it is.
Quote:
Relax... seriously. Don't get so defensive.
What are you even referring to? Seriously, I don't get it. I'm rereading my message right now, and I don't see how it comes off as defensive at all.
Anyway, this thread isn't a good place for you and I to make friends. Feel free to PM me if you don't want to change the topic of the OP.
Last edited by colucix; 03-07-2011 at 04:21 PM.
Reason: Removed colorful expression.
This thread is going a bit off-topic! Please, keep discussion fair and reasonable. No need to be pedantic, disrespectful or - even worse - offensive towards other members. The OP already gained proper answers and hopefully learned something useful about regular expressions. Nuff' said!
I'll unsubscribe from this thread immediately after this response. It's my hope that this won't be taken as offensive in any way--merely an explanation for my original response.
The original response from Telengard that I quoted (#10) accused the egrep command from David the H. (which was a modified version of grail's original command that addressed Telengard's comments about end-of-line spaces) as being worse than Telengard's own awk command--and did so by saying the egrep used a "whopping" number of characters. It's clear that "whopping" was not used in a complimentary way.
In my term here at LQ, I've participated in a number of threads. I have offered a number of solutions to problems. I cannot recall any instance where I accused another member's proposed solution as being worse or inferior to my own. I have pointed out technical problems with solutions posted, but I have never complained when a subsequent version of the command is posted to address those concerns. I simply, quietly let the OP decide which solution they want to use.
To me, such a complaint is an attack on the proposed solution in an effort to defend some other solution as "better." To me, that is unwarranted defensive behavior. We're all here to contribute and it's not about the "best" solution or the most "efficient" unless that's what the OP asks for.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.