LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-25-2009, 04:04 PM   #1
angel115
Member
 
Registered: Jul 2005
Location: France / Ireland
Distribution: Debian mainly, and Ubuntu
Posts: 492

Rep: Reputation: 75
Regx: Multi line pattern matching


Hi There,

I'm trying to match a pattern spreadout on multyple line. (without success until now)

Ex: "test.txt"
Code:
The cat
in the street
Look nice



I want to be able to match
"The cat
in the street"
for example

so here is what I've tryed so far


Code:
grep -e "m/^The.*street$/m" test.txt
grep -e "m/^The.*\n.*street$/m" test.txt
grep -e "m/The.*\n.*street/m" test.txt
grep -e "The.*\n.*street" test.txt
None of these are working for me

Any help on this would be highly appreciated.

Thanks,
Angel.

Last edited by angel115; 02-25-2009 at 04:07 PM.
 
Old 02-25-2009, 05:28 PM   #2
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Unless I'm mistaken, grep cannot match across newlines. However, it is very easy to use perl as a grep-on-steroids.
Code:
perl -e '$/ = undef; while(<>){ if( $_ =~ m/The cat\nin the street/ ){ print $&; }}'
Hard to say what you want it to print if it finds a match, in this situation. My example simply prints the found text. Change '$&' to '$_' if you want to print the entire input.
--- rod.
 
Old 02-25-2009, 05:51 PM   #3
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
As theNbomr says, grep is line oriented. Also, the syntax you're using ( m/regex/m ) looks more like Perl syntax already.

However, the authors of Unix Power Tools (a great, great book) provide a script they call cgrep which can do what you want. (It uses Perl to do its magic.)

Their script is a bit fancy. Here's a simple stab at it from me:

Code:
#!/usr/bin/env perl
use strict;
use warnings;

my $file = do { local $/; <> };

print "I found it!\n"
  if $file =~ m/^The.*street$/ms

Last edited by Telemachos; 02-25-2009 at 06:21 PM.
 
Old 02-25-2009, 07:39 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Code:
tr '\n' ' ' < file|grep -i "the cat in the street"
 
Old 02-26-2009, 06:44 AM   #5
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
Quote:
Originally Posted by ghostdog74 View Post
Code:
tr '\n' ' ' < file|grep -i "the cat in the street"
Fair enough, but you have to remove all the newlines in order to do the search that way. That's not always what you want from a multi-line search.
 
Old 02-26-2009, 07:27 AM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Quote:
Originally Posted by Telemachos View Post
Fair enough, but you have to remove all the newlines in order to do the search that way. That's not always what you want from a multi-line search.
then this should be enough
Code:
awk '/The cat/,/in the street/' file
 
Old 02-26-2009, 10:12 AM   #7
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
Quote:
Originally Posted by ghostdog74 View Post
then this should be enough
Code:
awk '/The cat/,/in the street/' file
Yup, that works nicely. And you can even do this (if you care about anchors, which the OP seemed to):
Code:
 awk '/^The cat/, /street$/' file
However, the OP asked about grep. Once you go to awk, then as far as I'm concerned, you may as well go all the way to Perl. But I respect that maybe that's just me.
 
Old 02-26-2009, 11:09 AM   #8
angel115
Member
 
Registered: Jul 2005
Location: France / Ireland
Distribution: Debian mainly, and Ubuntu
Posts: 492

Original Poster
Rep: Reputation: 75
Thumbs up

Hi theNbomr,

Yep, that's exactly what I need.

Thanks a lot.

Quote:
Originally Posted by theNbomr View Post
Unless I'm mistaken, grep cannot match across newlines. However, it is very easy to use perl as a grep-on-steroids.
Code:
perl -e '$/ = undef; while(<>){ if( $_ =~ m/The cat\nin the street/ ){ print $&; }}'
Hard to say what you want it to print if it finds a match, in this situation. My example simply prints the found text. Change '$&' to '$_' if you want to print the entire input.
--- rod.
Angel
 
Old 02-26-2009, 11:36 AM   #9
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Sorry to tell you that my post appears to actually not work. I did not read your question as closely as I should have, and I now see that while my example works, it does not work if modified to include wildcard style regex metachars, such as '*' and '.'
After reading the camel book, I now see that assignement of undef to '$/' is also not mentioned, and I wonder if my use of this idiom exploited some undocumented behavior in one or more other versions of perl I have used. I know I have used a similar technique in the past, and always thought it was defined behavior. My bad. Back to the drawing board.
--- rod.
 
Old 02-26-2009, 12:13 PM   #10
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Okay, I don't want to stick my neck out and start yelling 'Bug!', but this feels like one to me. If I use the original poster's input file as a test case, I can successfully match the expected text with this:
Code:
perl -e '{ local $/; while(<>){ if( $_ =~ m/The.*\s.*street/ ){ print "\n$&\n"; }else{ print "Not found in:\n$_\n";}}}'
but if I simply remove the '\s' (which ostensibly matches the newline char) from the regex, these fail:
Code:
perl -e '{ local $/; while(<>){ if( $_ =~ m/The.*.*street/ ){ print "\n$&\n"; }else{ print "Not found in:\n$_\n";}}}'
perl -e '{ local $/; while(<>){ if( $_ =~ m/The.*street/ ){ print "\n$&\n"; }else{ print "Not found in:\n$_\n";}}}'
An odd looking workaround seems to be:
Code:
perl -e '{ local $/; while(<>){ if( $_ =~ m/The[\s\S]+street/ ){ print "\n$&\n"; }else{ print "Not found in:\n$_\n";}}}'
Grateful to anyone who can explain this.

Code:
perl -v
This is perl, v5.8.5 built for i386-linux-thread-multi
--- rod.

Last edited by theNbomr; 02-26-2009 at 01:08 PM.
 
Old 02-26-2009, 12:46 PM   #11
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
You're tripping up on the s modifier for multiline regular expressions, I think. It doesn't match a newline per se. What it does is allow . to match across a newline.

So, these two fail without s since they can't use the . to match across newlines:
Code:
perl -e '{ local $/; while(<>){ if( $_ =~ m/The.*.*street/ ){ print "\n$&\n"; }else{ print "Not found in:\n$_\n";}}}'
perl -e '{ local $/; while(<>){ if( $_ =~ m/The.*street/ ){ print "\n$&\n"; }else{ print "Not found in:\n$_\n";}}}'
The next one works because you are using [\s\S]. Since one captures any space, tab, newline or formfeed and the other captures anything other than space, tab, newline or formfeed, the combo [\s\S] should ALWAYS match. Example file:
Code:
Hello, this is
my favorite
letter of
the 
alphabet
Then,
Code:
hektor ~/test $ perl -ne 'print "Match: $_" if m/[\s\S]/' file
Match: Hello, this is
Match: my favorite
Match: letter of
Match: the 
Match: alphabet
The bottom line is that to match using .* across newlines, you need the s modifier.

Code:
 perl -e '$file = do { local $/; <> }; print "Match: $&\n" if $file =~ m/^Hello.*favorite$/ms' file
See this article for more: http://www.perl.com/pub/a/2003/06/06/regexps.html

Last edited by Telemachos; 10-24-2009 at 09:37 PM. Reason: Hideous grammar goof
 
Old 02-26-2009, 01:07 PM   #12
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Telemachos, thanks for clearing that up. I now see that when I read 'A "." matches any character except \n', I mentally translated that to mean 'A "." matches any character except the value of $/'. I now see that this was incorrect, and the non-match against '\n' is static (except when overridden by the S modifier, as you point out).
It always feels better when there is some rationale behind these kinds of seemingly inconsistent behaviors. Do you know why such an exception exists, what problem it solves, or how it adds value? There is nothing I can think of that explains this behavior in a useful way.
--- rod.
 
Old 02-26-2009, 02:45 PM   #13
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
Quote:
Originally Posted by theNbomr View Post
Do you know why such an exception exists, what problem it solves, or how it adds value? There is nothing I can think of that explains this behavior in a useful way.
This is from perldoc perlretut (the quotation comes from a bit below that anchor):
Quote:
You might wonder why '.' matches everything but "\n" - why not every character? The reason is that often one is matching against lines and would like to ignore the newline characters. For instance, while the string "\n" represents one line, we would like to think of it as empty.
The doc goes on to give some examples of the contrasting behavior.

Last edited by Telemachos; 02-26-2009 at 02:46 PM.
 
Old 02-26-2009, 03:26 PM   #14
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Thanks. Good reference. Makes some sense, so I guess I'm okay with it, now.

Sorry to the original poster for somewhat hijacking his/her thread.

--- rod.
 
Old 02-26-2009, 07:27 PM   #15
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Quote:
Originally Posted by Telemachos View Post
Yup, that works nicely. And you can even do this (if you care about anchors, which the OP seemed to):
Code:
 awk '/^The cat/, /street$/' file
However, the OP asked about grep. Once you go to awk,
lol, the fact the grep is unable to do multiline pattern match(AFAIK) warrants a better suggestion right?

Quote:
then as far as I'm concerned, you may as well go all the way to Perl. But I respect that maybe that's just me.
as far as text processing is concern, awk is equally capable.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to replace string pattern with multi-line text in bash script? brumela Linux - Newbie 6 04-21-2011 06:56 AM
printing pattern match and not whole line that matches pattern Avatar33 Programming 13 05-06-2009 06:17 AM
Script Help: How to count a matching pattern in one line? dv502 Programming 3 12-13-2008 01:53 PM
pattern matching in file amitpardesi Linux - Software 5 02-08-2008 07:06 AM
pattern matching nadeemr Linux - Newbie 8 06-13-2007 11:05 AM


All times are GMT -5. The time now is 08:45 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration