LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   [What should be] A simple grep regex question (https://www.linuxquestions.org/questions/linux-newbie-8/%5Bwhat-should-be%5D-a-simple-grep-regex-question-4175496147/)

taylorkh 02-25-2014 08:40 AM

[What should be] A simple grep regex question
 
I am searching for a text file containing the letter v repeated 3 or more times. I have created such a file in the pwd. I execute the following grep command (CentOS 6) and here is what happens
Quote:

[ken@taylor12 Desktop]$ grep -l v{3,} *.txt
grep: v: No such file or directory
I have tried -E for extended grep - same issue. Also placing v in [] - same message. This should not be so hard. HELP!

TIA,

Ken

p.s.
Quote:

[ken@taylor12 Desktop]$ grep -l vvv *.txt
will find the file.

szboardstretcher 02-25-2014 08:48 AM

Code:

grep '\<vvv*\>' *.txt
3 or more times?

rknichols 02-25-2014 09:22 AM

Quote:

Originally Posted by taylorkh (Post 5124334)
I am searching for a text file containing the letter v repeated 3 or more times. I have created such a file in the pwd. I execute the following grep command (CentOS 6) and here is what happens
Code:

[ken@taylor12 Desktop]$ grep -l v{3,} *.txt
grep: v: No such file or directory

I have tried -E for extended grep - same issue.

Please do not use [QUOTE] tags for code. It makes it hard to quote you in a response.

Your problem is that the pattern argument is being interpreted by the shell as a brace expansion resulting in two arguments, "v3" and "v". You need to quote that pattern. Also, in basic regular expressions the "{" and "}" characters are not special, so you either need to escape them with a literal backslash or use an extended regular expression. Any of these should work:
Code:

grep -l 'v\{3,\}' *.txt
grep -l "v\\{3,\\}" *.txt
grep -l -E "v{3,}" *.txt
egrep -l "v{3,}" *.txt

Note that use of "{" and "}" in extended regular expressions can be non-portable. See "Basic vs Extended Regular Expressions" in the grep manpage for details.

taylorkh 02-25-2014 12:05 PM

Thanks folks! Sorry about the QUOTE. I will try to remember to use the code tags instead. What I am actually working towards is a filter in the Agent news reader program which will catch any post with a subject containing an unbroken string of 19 or more letters or numbers. According to several regex tutorials and references this would seem to be something like
Code:

[a-z1-0]{19,}
However, Agent's regex handling is somewhat non-standard. I have taken a few trys at the problem and finally sent in a question to their tech support folks to see if they can explain the syntax for their irregular regular expressions.

Ken

szboardstretcher 02-25-2014 12:09 PM

Yeahhhhh. It seems a LOT different than a standard implementation.

I found some of the differences listed here:

http://www.cotse.net/users/bluejay/a...egular2.htm#II

taylorkh 02-25-2014 12:49 PM

Wow szboardstretcher,

That page goes back a while. Agent 1.8 was about 15 - 20 years ago. The current version is 7.2. I have been using Agent and before that Free Agent for about 20 years. I have been running it under Wine for almost 10 years. I guess I should learn Pan but...

This is the sort of crap I am trying to filter out
Quote:

Subject: 529182d05e118577bf6be479585b57d257c7751d 14/75 - 529182d05e118577bf6be479585b57d257c7751d.part13.rar yEnc /21
Ken

szboardstretcher 02-25-2014 01:00 PM

Yes it is. Its still valid apparently.

So quantifiers don't work in agent. So you would have to do this to catch anything that is "19 alphanumeric characters plus" in a single word.

Code:

{[a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0][a-z1-0]}
Here is the forte agent conversation where they explain this:

http://www.freag.net/en/t/3c7x4/question_on_reg

taylorkh 02-25-2014 02:31 PM

Close but no cigar. I had tried replicating the range term 19 times but it did not work (although I do not recall exactly how I had the thing constructed.) I tried your formulation without success. However, if I change 1-0 to 0-9 it works like a champ :) Thanks!!!

Ken

p.s. Now if you would tell me how to make Firefox and Thunderbird remember that I want them to use English - US for spelling check. It has degenerated to the point that I have to reset that preference (from nothing selected) EVERY time I invoke either of the programs. But that is the subject for another post.

chrism01 02-26-2014 03:49 AM

Re FF; I don't know about using Preferences, I just install the correct dictionary from the mozilla addons website eg https://addons.mozilla.org/en-US/fir...nary-/?src=api

szboardstretcher 02-26-2014 07:36 AM

Quote:

Originally Posted by taylorkh (Post 5124550)
Close but no cigar. I had tried replicating the range term 19 times but it did not work (although I do not recall exactly how I had the thing constructed.) I tried your formulation without success. However, if I change 1-0 to 0-9 it works like a champ :) Thanks!!!

Ken

p.s. Now if you would tell me how to make Firefox and Thunderbird remember that I want them to use English - US for spelling check. It has degenerated to the point that I have to reset that preference (from nothing selected) EVERY time I invoke either of the programs. But that is the subject for another post.

Did you install the french version of Firefox or something?

taylorkh 02-26-2014 09:03 AM

As to Firefox... When I right click on the text box where I am typing this message I an [X] Check Spelling. In fact that is always checked. However Firefox will NOT check the spelling unless I also select the language to use. I have TWENTY TWO species of English listed!

Now this is interesting... When I looked at the list to make the count I found that "English (United States)" was NOT checked. This is the normal mode of failure. I counted the available options. I did NOT choose the radio button for English (United States). I then started to flail at the keyboard to produce this message. I found that spell checking was taking place and upon inspection I found that English (United States) was selected.

It seems that Firefox forgets that a spelling check language was selected but somewhere it remembers that it is supposed to use English (United States). I guess I need to poke around in about:config. Or perhaps I should install Firfox in French :D

As to where the English variations from Australia to Zimbabwe came from... I am running CentOS 6.5. I guess it installed these languages(?) or perhaps not. Firefox on Ubuntu only shows 4 English variations even though it downloads a ton of language packs during the OS install.

I suspect the cause of the issue is something in my profile. I have several addons and some tweaks. A "clean" FF profile seems to remember its language.

Ken

p.s. Here is the recommendation for the Agent regex from Agent Tech Support (the original thread).


Hi Ken,

An Agent user proposed the following filter expressions which seem
reasonably good at removing spam.

Start with this filter expression which will remove messages where the
subject is 20 characters long or more and does not include space, period,
left or right bracket, 'at' symbol, or dash anywhere in its first 20
characters

subject: {^[^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^
\.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^
\.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^
\.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-]}

If you also want to remove posts where the subject is less than 20
characters long and does not include space, period, left or right bracket,
'at' symbol, or dash anywhere in its entire length, then use the following.

subject: ( {^[^ \.\[\]@-]*$} or {^[^ \.\[\]@-][^ \.\[\]@-][^
\.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^
\.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^
\.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-][^ \.\[\]@-]} )

Let me know if this helps.

Regards,

Tom
Agent Team


All times are GMT -5. The time now is 12:29 AM.