Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
01-04-2014, 11:38 PM
|
#1
|
LQ Newbie
Registered: Jul 2012
Posts: 5
Rep:
|
grep / egrep negation in an OR expression
Hi there, I couldn't find a solution so far, although I've been googling quite a bit.
My goal is to grep through a file and match
° either lines that contain no alphabetic letters
° OR lines that contain word
the closest I came to was
Code:
cat file | egrep "(-v [[:alpha:]]|word)"
but that will only find word and not the numbers-only lines.
So is there a solution that goes like
Code:
cat file | egrep (![[:alpha:]]|word)
EITHER ↑no alpha OR word
(pseudocode, ! like the bash negation)
I guess I'm blind or too tired, but I can't figure it out...
Yes, egrep (this|that) file would be nicer than cat file |, but whenever I have read the file first, I just leave the cat and apply the grep ;-)
Last edited by wandering dog; 01-04-2014 at 11:40 PM.
|
|
|
01-05-2014, 12:59 AM
|
#2
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,315
|
You can get close with grep, but you have to know too much about the structure of the data - use something with regex smarts; awk or perl say.
|
|
|
01-05-2014, 11:28 AM
|
#3
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,804
|
This appears to do what you want:
Code:
egrep '^[^[:alpha:]]*$|word'
|
|
2 members found this post helpful.
|
01-05-2014, 11:31 PM
|
#4
|
LQ Newbie
Registered: Jul 2012
Posts: 5
Original Poster
Rep:
|
Quote:
Originally Posted by rknichols
This appears to do what you want:
Code:
egrep '^[^[:alpha:]]*$|word'
|
Wow! Thanks! It does. I admit I can't see the negation here, i.e. where exactly this expression does it. I read like
Code:
^[^[:alpha:]]*$
^^^ ^ ^^
||| | ||
||| | ||
||| | ||
||| | |------------------------------------------------------------------------------the line ends
||| | ---------------------------------------------------------------------------until
||| --------------------------------------------------then, any alphabetic character
||----------------------------------term begins again (??)
| --------------a square bracket (?)
term begins with
OK, i tried to google it once again, and I notice I had already skimmed a site where negation is explained.
If ever anyone faces the same question and comes here, here's another link:
Be careful, because [^a-z] inside a character class indicates negation
thanks rk and syg
|
|
|
01-06-2014, 12:12 AM
|
#5
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,315
|
What about if there is whitespace in a numeric line - or a alphanumeric line ?. This is what I meant about having to know the data intimately.
grep (and sed) can do this well, but it can be difficult to cover all corner cases.
Glad you picked up some pointers.
|
|
|
01-06-2014, 11:50 AM
|
#6
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,804
|
Quote:
Originally Posted by syg00
What about if there is whitespace in a numeric line - or a alphanumeric line ?.
|
Whitespace, punctuation, or control characters are not alphabetic characters and would not block the match. The presence of any alphabetic characters would block the first branch of the RE, but might still match word in the second branch.
The given RE doesn't really address the "negation in an OR expression" posed in the thread title (a RE can't do that), but restates the problem as either of two positive matches. The first branch of the RE parses as:
Code:
^[^[:alpha:]]*$
^^^\ /^^^
||| \ / |||
||| \ / |||
||| \ / |||
||| ^ ||+---------------------------------------------------end of line
||| | |+---------------------------------------zero or more matches of the preceding atom
||| | +----------------------------end of bracket expression
||| +-------------------a list of all alphabetic characters
||+--------------match any character not in the list
|+-------start of bracket expression
beginning of line
Last edited by rknichols; 01-06-2014 at 11:53 AM.
|
|
2 members found this post helpful.
|
01-06-2014, 12:36 PM
|
#7
|
LQ Veteran
Registered: Sep 2003
Posts: 10,532
|
@rknichols: Nice solution! However
You might have overlooked something:
Code:
egrep '^[^[:alpha:]]*$|word'
The second part looks for word, which would include xword, wordy and xwordy, which might not be wanted.
I just noticed that word boundaries aren't picked up when you do this:
Code:
# invalid example
egrep '^[^[:alpha:]]*$|\bword\b' infile
This does work on my side:
Code:
grep -P '^[^[:alpha:]]*$|\bword\b' infile
|
|
|
01-06-2014, 01:33 PM
|
#8
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,804
|
Quote:
Originally Posted by druuna
The second part looks for word, which would include xword, wordy and xwordy, which might not be wanted.
|
Actually I thought of that at the time, but also realized that always delimiting the word with "\b" would prevent strings like "123word" from matching, which might or might not be desired behavior. And really, that part of the expression is just like any common use of grep.
|
|
|
01-06-2014, 01:42 PM
|
#9
|
LQ Newbie
Registered: Jul 2012
Posts: 5
Original Poster
Rep:
|
Actually the common grep behavior was the one I wanted to have, I just wanted to filter either numerical ip-adresses OR a specific domain from a large list .
|
|
|
All times are GMT -5. The time now is 09:58 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|