Yet another regex problem
without lookarounds how would you make a regex that eliminates as much noise as possible. i have to use grep -E.
i am looking for: (X)anything or (X) anything but want to exclude noise that matches (or exclude as much as possible) (X)word (X) word |
You might need to explain more ... what is the difference between 'word' and 'anything'?
|
"anything" is random string
"word" is specific string grep'ing for pattern Code:
"(X)anything" Code:
"(X)word" Code:
\(X\)[^w] a regex that eliminates more than desired is ok, just wish to minimize the exclusion set, etc. its a pita w/o lookarounds, just seeing what you guys might suggest. |
This is what I came at:
Code:
grep -E "\(X\)[[:blank:]]?[^ w]" file To eliminate the exact word "word": Code:
grep -E "\(X\)[[:blank:]]?[^ ]" file | grep -v "word" |
sycamorex,
excluding all words tha begin with "w" produces an exclusion set so big. i was trying to make that exclusion set as small as possible. i can only also do a single regex using grep -E (no piping or posix char sets available, etc) i came up with this: Code:
\(X\)([ ][a-z]{2}[^r]|[a-z]{2}[^r]) |
Could you send a sample data? It migh help
|
its hard to give the negative, but lets try.
"(X) Happy" or "(X)Happy" is noise in my files, but i want to find pattern that is equiv boolean to this: "(X)" followed by NOT "Happy", OR, "(X)" followed by "single space" followed by NOT "Happy" its a pita w/o lookarounds, so the only way i see is to build regex that gives smallest exclusion set. sample file Code:
(X)Happy (X) Happy |
Why not just do:
Code:
grep -vE '(X).?Happy' <file> |
Quote:
It also matches: - (X) word (2 or more spaces after (X)) - and lines NOT starting with (X) ...if such lines exist. Is the actual data the OP provided accurate and representative of the whole file? |
I am with Cedric except that the dot should simply be a space. Until provided reasons why it is not acceptable it does answer the present question:
Code:
grep -vE '(X) ?Happy' file |
i dont want lines that dont have the noise and not my wants.
let me try and clarify. the search tool is a "grep -E" equivalent, so i do not have -v option, or ability to pipe, etc. Code:
grep searches (X)Happy the named input FILEs hit is "(X)[space][word]" or "(X)[word]" the sample file above has 14 lines. if i had lookarounds i would get: 1 no match 2 match for "(X) Frown" 3 match for "(X)puppy" 4 no match 5 no match 6 no match 7 no match 8 no match 9 no match 10 match for "(X)Chuck" 11 match for "(X) Pencil" 12 no match 13 no match 14 match for "(X) Denny" so w/o lookarounds using "grep -E '/regex/' file" i only see a way to build an exclusion set which will vary in size depending on the actual word to be excluded and the analytics of words. Code:
so in this example i use something like this: |
All times are GMT -5. The time now is 05:30 PM. |