LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-25-2010, 07:10 PM   #1
bigbot
LQ Newbie
 
Registered: Sep 2010
Posts: 6

Rep: Reputation: 0
Using Grep with Pattern File and PCRE


I would like to write a newline delimeted rules file using PCREs for use with the grep command. Grep has the option -f to obtain the search pattern from a file, and option -P to search using PCREs. However, these two options do not work together. The -f option only seems to work with fixed string rules.

A friend previously helped me get around this limitation somehow, but I can't remember how he did it. I also would like the ability to add comments at the end of each rule in the file.

Example Rules File-
Code:
'^John.*Sally$'                              # 09/22/10 - Per Steve Johnson
'Jack\.Ripper[0-9]{1,3}$'                    # 06/15/09 - Remove on 07/01/09
Here is what I've tried so far-
Code:
cat data.file | grep -P -f rules.file        # Doesn't work
cat data.file | grep -P 'rule1|rule2|rule3'  # Works but I want to pull rules from a file and be able to add comments at the end of the lines
Thanks for any help!

Last edited by bigbot; 09-25-2010 at 07:43 PM.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 09-25-2010, 08:25 PM   #2
fuubar2003
Member
 
Registered: May 2004
Location: Orlando, Florida
Distribution: SLES10/11, RH4/5 svrs, Fedora, Debian/Ubuntu/Mint; FreeBSD/OpenBSD
Posts: 63

Rep: Reputation: 26
I'm not getting what yer tryin to do.

One thing (and I know this is not addressing your question), you can lose the 'cat' part before the pipe and just run:

grep -P -f rules.file data.file


Per the grep man page, -P is experimental....
 
Old 09-26-2010, 02:02 AM   #3
kingzog
LQ Newbie
 
Registered: Jun 2006
Distribution: Gentoo
Posts: 20

Rep: Reputation: 0
Quote:
Originally Posted by fuubar2003 View Post
I'm not getting what yer tryin to do.

One thing (and I know this is not addressing your question), you can lose the 'cat' part before the pipe and just run:

grep -P -f rules.file data.file


Per the grep man page, -P is experimental....
I believe what the original poster is trying to do is to have one call to grep work for multiple patterns. So if I had a file that had two lines in it, one "Hello" and the other "World", I could do something akin to "grep -f HelloWorld.txt *", and have every line containing EITHER hello or world show in the results.

I don't know how to do this with grep. This is possible with awk and perl scripting, but that's probably more time consuming than the poster wants.


One alternative might be, since you're using regular expressions, to wrap the whole thing in a series of OR groups. So for "Hello" and "World" you could use "(Hello|World)"
 
Old 09-26-2010, 06:07 AM   #4
fuubar2003
Member
 
Registered: May 2004
Location: Orlando, Florida
Distribution: SLES10/11, RH4/5 svrs, Fedora, Debian/Ubuntu/Mint; FreeBSD/OpenBSD
Posts: 63

Rep: Reputation: 26
The '-e' switch allows for multiple expressions so you can do a 'grep -e <item> -e <item>' over and over again....can also do the multiple grep's seperated by pipes but that is so lame.

I used awk to find any lines with IMAGE and print column 6 and find any lines with FRAG and print column 9:
awk '{ if($1=="IMAGE") print $6; if($1=="FRAG") print $9}'


Not sure if any of this is helpful.
 
Old 09-26-2010, 02:04 PM   #5
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
You could iterate over all patterns in your file, and run grep with individual patterns.
Code:
while read pattern; do
    # remove trailing comment
    regex=${pattern%#*}
    grep $regex data.file
done < pattern.file
--- rod.
 
2 members found this post helpful.
Old 09-26-2010, 06:46 PM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,294

Rep: Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358Reputation: 2358
You could look at egrep for more advanced options.
 
Old 09-27-2010, 12:35 AM   #7
bigbot
LQ Newbie
 
Registered: Sep 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Thank you for the responses and that's an interesting solution theNbomr. Unfortunately my Linux computer is not working so well right now so I am forced to use the Windows machine to post. However, I did get in touch with my friend and he reminded me about how we did this before. This isn't going to be 100% correct but I will edit it later when I am able to test it.

Code:
grep -P '`cat rules.file | sed -r 's/ *\#.*//' | tr '\n' '\|' | sed -r 's/\|$//'`' data.file
So basically we cat the rules file out, remove all spaces before the #, the # itself, and everything after the #. Then the newlines of the rules.file are replaced with pipes. Finally the last newline in the rules file (which will screw up grep) is removed.

Grep *should* interpret this command as:

Code:
grep -P 'rule1|rule2|rule3' data.file
Phew! When running this yesterday it seemed to work pretty well. The only thing I couldn't get to work was a "grep -vP '^$' " statement to remove all blank lines. I put that in after the cat statement, but kept getting some weird variable error when the whole thing was run. I know that would work on the command line, so I'm not sure what the problem is.

Last edited by bigbot; 09-27-2010 at 12:43 AM.
 
Old 09-27-2010, 01:17 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,437

Rep: Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842
So ultimately it appears you could replace all of:
Code:
grep -P '`cat rules.file | sed -r 's/ *\#.*//' | tr '\n' '\|' | sed -r 's/\|$//'`' data.file
with something like (untested):
Code:
grep -P $(awk -F="[ \t]*#" '{print $1}' rules.file) data.file
 
1 members found this post helpful.
Old 09-27-2010, 01:18 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by theNbomr View Post
You could iterate over all patterns in your file, and run grep with individual patterns.
Code:
while read pattern; do
    # remove trailing comment
    regex=${pattern%#*}
    grep $regex data.file
done < pattern.file
--- rod.

A better way is to concat the regex pattern first, then pass the pattern to grep instead of calling grep for every pattern iterated.
Code:
grep -E $(sed 's/[ \t]*#.*//' rules |tr "\n" "|"|sed 's/|$//') file
 
Old 09-30-2010, 03:09 AM   #10
bigbot
LQ Newbie
 
Registered: Sep 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Thank you for the responses and the awesome awk solution! Here is what ended up working (with the awk command as well):
Code:
grep -P "`cat rules.file | grep -vP '^$' | sed -r 's/\s*\#.*//' | tr '\n' '\|' | sed -r 's/\|$//'`" data.file
For some reason the double quotes were needed around the entire grep statement. Something to do with how bash interpreted the command. Single quotes would not work.

Also added the grep command to remove blank lines.
 
Old 09-30-2010, 03:52 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,437

Rep: Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842
Well I am glad you have a solution although I am not sure why you need so many calls to all the different apps. Calling cat is definitely not required
 
Old 10-01-2010, 09:33 AM   #12
bigbot
LQ Newbie
 
Registered: Sep 2010
Posts: 6

Original Poster
Rep: Reputation: 0
I agree and I'm just not familiar with awk yet. I will use your example to see if I can use that instead.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] how to make grep to search a pattern in only specific file type mq15 Linux - Newbie 7 03-07-2010 09:41 AM
searching pattern in tab delimited file using grep vaibhavs17 Programming 5 03-05-2010 01:02 AM
using grep when the pattern contains a ! farmerjoe Programming 9 03-15-2005 11:04 PM
Grep pattern first line of a file ericcarlson Linux - Newbie 11 07-20-2004 10:51 AM


All times are GMT -5. The time now is 12:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration