LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-04-2014, 10:38 PM   #1
wandering dog
LQ Newbie
 
Registered: Jul 2012
Posts: 5

Rep: Reputation: Disabled
grep / egrep negation in an OR expression


Hi there, I couldn't find a solution so far, although I've been googling quite a bit.


My goal is to grep through a file and match
either lines that contain no alphabetic letters
OR lines that contain word


the closest I came to was

Code:
cat file | egrep "(-v [[:alpha:]]|word)"
but that will only find word and not the numbers-only lines.

So is there a solution that goes like

Code:
cat file | egrep (![[:alpha:]]|word)
        EITHER    ↑no alpha  OR word
(pseudocode, ! like the bash negation)

I guess I'm blind or too tired, but I can't figure it out...


Yes, egrep (this|that) file would be nicer than cat file |, but whenever I have read the file first, I just leave the cat and apply the grep ;-)

Last edited by wandering dog; 01-04-2014 at 10:40 PM.
 
Old 01-04-2014, 11:59 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,976

Rep: Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281
You can get close with grep, but you have to know too much about the structure of the data - use something with regex smarts; awk or perl say.
 
Old 01-05-2014, 10:28 AM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,429

Rep: Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029
This appears to do what you want:
Code:
egrep '^[^[:alpha:]]*$|word'
 
2 members found this post helpful.
Old 01-05-2014, 10:31 PM   #4
wandering dog
LQ Newbie
 
Registered: Jul 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rknichols View Post
This appears to do what you want:
Code:
egrep '^[^[:alpha:]]*$|word'
Wow! Thanks! It does. I admit I can't see the negation here, i.e. where exactly this expression does it. I read like

Code:
^[^[:alpha:]]*$
^^^    ^     ^^
|||    |     ||
|||    |     ||
|||    |     ||
|||    |     |------------------------------------------------------------------------------the line ends
|||    |     ---------------------------------------------------------------------------until
|||    --------------------------------------------------then, any alphabetic character
||----------------------------------term begins again (??) 
| --------------a square bracket (?)
term begins with
OK, i tried to google it once again, and I notice I had already skimmed a site where negation is explained.

If ever anyone faces the same question and comes here, here's another link:

Be careful, because [^a-z] inside a character class indicates negation

thanks rk and syg
 
Old 01-05-2014, 11:12 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,976

Rep: Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281Reputation: 3281
What about if there is whitespace in a numeric line - or a alphanumeric line ?. This is what I meant about having to know the data intimately.
grep (and sed) can do this well, but it can be difficult to cover all corner cases.

Glad you picked up some pointers.
 
Old 01-06-2014, 10:50 AM   #6
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,429

Rep: Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029
Quote:
Originally Posted by syg00 View Post
What about if there is whitespace in a numeric line - or a alphanumeric line ?.
Whitespace, punctuation, or control characters are not alphabetic characters and would not block the match. The presence of any alphabetic characters would block the first branch of the RE, but might still match word in the second branch.

The given RE doesn't really address the "negation in an OR expression" posed in the thread title (a RE can't do that), but restates the problem as either of two positive matches. The first branch of the RE parses as:
Code:
^[^[:alpha:]]*$
^^^\       /^^^
||| \     / |||
|||  \   /  |||
|||   \ /   |||
|||    ^    ||+---------------------------------------------------end of line
|||    |    |+---------------------------------------zero or more matches of the preceding atom
|||    |    +----------------------------end of bracket expression
|||    +-------------------a list of all alphabetic characters
||+--------------match any character not in the list
|+-------start of bracket expression
beginning of line

Last edited by rknichols; 01-06-2014 at 10:53 AM.
 
2 members found this post helpful.
Old 01-06-2014, 11:36 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397
@rknichols: Nice solution! However

You might have overlooked something:
Code:
egrep '^[^[:alpha:]]*$|word'
The second part looks for word, which would include xword, wordy and xwordy, which might not be wanted.

I just noticed that word boundaries aren't picked up when you do this:
Code:
# invalid example
egrep '^[^[:alpha:]]*$|\bword\b' infile
This does work on my side:
Code:
grep -P '^[^[:alpha:]]*$|\bword\b' infile
 
Old 01-06-2014, 12:33 PM   #8
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,429

Rep: Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029Reputation: 2029
Quote:
Originally Posted by druuna View Post
The second part looks for word, which would include xword, wordy and xwordy, which might not be wanted.
Actually I thought of that at the time, but also realized that always delimiting the word with "\b" would prevent strings like "123word" from matching, which might or might not be desired behavior. And really, that part of the expression is just like any common use of grep.
 
Old 01-06-2014, 12:42 PM   #9
wandering dog
LQ Newbie
 
Registered: Jul 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Actually the common grep behavior was the one I wanted to have, I just wanted to filter either numerical ip-adresses OR a specific domain from a large list .
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] grep or egrep help roopakl Linux - Newbie 3 12-19-2012 11:12 AM
how to do Logical AND with grep or egrep amit_pansuria Programming 2 08-09-2009 02:39 AM
Help with ls and grep/egrep kasthana Linux - Newbie 1 05-29-2008 01:06 PM
using grep and egrep in the terminal KumARan23 Linux - Newbie 3 11-11-2007 09:27 AM
Using Grep and Egrep linux-nerd Linux - General 5 10-10-2004 11:37 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 04:45 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration