grep / egrep negation in an OR expression
Hi there, I couldn't find a solution so far, although I've been googling quite a bit.
My goal is to grep through a file and match ° either lines that contain no alphabetic letters ° OR lines that contain word the closest I came to was Code:
cat file | egrep "(-v [[:alpha:]]|word)" So is there a solution that goes like Code:
cat file | egrep (![[:alpha:]]|word) I guess I'm blind or too tired, but I can't figure it out... Yes, egrep (this|that) file would be nicer than cat file |, but whenever I have read the file first, I just leave the cat and apply the grep ;-) |
You can get close with grep, but you have to know too much about the structure of the data - use something with regex smarts; awk or perl say.
|
This appears to do what you want:
Code:
egrep '^[^[:alpha:]]*$|word' |
Quote:
Code:
^[^[:alpha:]]*$ If ever anyone faces the same question and comes here, here's another link: Be careful, because [^a-z] inside a character class indicates negation thanks rk and syg |
What about if there is whitespace in a numeric line - or a alphanumeric line ?. This is what I meant about having to know the data intimately.
grep (and sed) can do this well, but it can be difficult to cover all corner cases. Glad you picked up some pointers. |
Quote:
The given RE doesn't really address the "negation in an OR expression" posed in the thread title (a RE can't do that), but restates the problem as either of two positive matches. The first branch of the RE parses as: Code:
^[^[:alpha:]]*$ |
@rknichols: Nice solution! However ;)
You might have overlooked something: Code:
egrep '^[^[:alpha:]]*$|word' I just noticed that word boundaries aren't picked up when you do this: Code:
# invalid example Code:
grep -P '^[^[:alpha:]]*$|\bword\b' infile |
Quote:
|
Actually the common grep behavior was the one I wanted to have, I just wanted to filter either numerical ip-adresses OR a specific domain from a large list ;).
|
All times are GMT -5. The time now is 06:45 PM. |