LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 10-10-2008, 02:01 AM   #1
Felipe
Member
 
Registered: Oct 2006
Posts: 302

Rep: Reputation: 32
regex on multilines


Hallo:

I'm trying to do a regular expresesion search on multilines:

EJ:

...
{
....
name = name1;
....
value =value1;
....
code = code1;
....
}

{
....
name = name2;
....
value =value2;
....
code = code1;
....
}
...
I've many structures on this way.
If search for example "name = name2", structure 2 should be return (all information betwen { }); If search for "code = code1" both structures should be return;

I've found this regular expression:
sed -n -e "/{/,/}/p" file
but I'm returned all structures.

With:
sed -e "/./{H;$!d;}" -e "x;/name = name2/!d;" file
I'm returned what I'm looking for, but blank lines are the separator instead of { }.


Any idea?

Thanks
 
Old 10-10-2008, 03:21 AM   #2
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Having a blank line between records make things easier because you can use a range ending in /^$/ which is a common practice.
Using "{" and "}" makes things very difficult because ranges are delineated with "{" and "}". Brackets are also used to group commands.

Code:
jschiwal@qosmio:~> sed -n '/{/,/^$/
                                   { /{/,/name = name1;/H
                                     /name = name1/,/}/{ 
                                                         /name = name1/n;H;
                                                         /}/{g;p}
                                                       }
                                   }' testfile

{
....
name = name1;
....
value =value1;
....
code = code1;
....
}
/{/,/^$/ is a range of a single record.

There are two subranges which contain the match you are looking for:
/{/,/name = name1;/ is a subrange which includes the test, from the first line to the match.
/name = name1;/,/}/ is a subrange from the matching line to the end of the record.

because the matching line is listed twice, the "n" command skips the line and Holds the next line
/name = name1/n;H;
/}/ matches the end of the record (with the match) so get the held line and print them. Note that the two commands are bracketed so
that the commands are executed on the same line.

Another approach could be to design a state machine using :labels and "b" or "t" branches.

Last edited by jschiwal; 10-10-2008 at 03:35 AM.
 
Old 10-10-2008, 04:01 AM   #3
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
I noticed that I forgot to clear out the Hold buffer after the "}" character.
Code:
sed -n '/{/,/^$/{
                  /{/,/code = code1;/H
                  /code = code1/,/}/{
                                      /code = code1/n;H;
                                      /}/{ g;p }
                                      /}/{ s/.*//;x }
                                    }
                }' testfile
 
Old 10-10-2008, 06:51 AM   #4
Felipe
Member
 
Registered: Oct 2006
Posts: 302

Original Poster
Rep: Reputation: 32
Thumbs up

YES, it works fine.!!!!!!!!

I will have to study how....

Thanks

Felipe
 
Old 10-10-2008, 08:56 AM   #5
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940
(Shrug...) "It's just me, but" I'd use awk for that.

An awk "program" is really drop-dead simple: it consists of one or more blocks that look like this:
Code:
  /pattern/
     {
      block of code to be executed if this pattern was matched,
      written in a "vaguely 'C'-like" language.
     }
There are also "pseudo-patterns," like BEGIN (which executes before the first line is read), and END.

"And that's about it."

What I really like about it is that it's obviously designed-for tasks just like the one you are facing. Furthermore, the solution remains easy-to-read and easy-to-change.

I mean... yes, I could "figure out" that sed-script that you've shown, and a few weeks from now I could "figure it out all over again," but (a) I would indeed have to do that, and (b) it would require about the same amount of mental-effort each time. And the same would be true if I had written it myself. Also, (c) if the input format subsequently changed, however slightly, I fear that I would be having to figure-it-out all over again!

I therefore find that awk, and its "honorary big-brother" perl (which is a full-fledged programming language), are much more suited to these common tasks.

As the Perl community likes to say, TMTOWTDI = "there's more than one way to do it." That's especially true in Unix/Linux. It's definitely worth your time to spend the time poking around your system.
 
Old 10-10-2008, 11:52 PM   #6
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Feel free to submit your awk solution. The poster had tried sed. I was trying to show some things with sed that might be useful.
  • Use address pattern ranges to enclose the range to work with.
  • Use brackets enclose subranges.
  • Use brackets to enclose a group of commands on a pattern. ( This can prevent lossing lines saved with the N command, etc. )
  • The end of the first subrange was the desired pattern.
  • The beginning of the second subrange was the desired pattern. Doing this, I prevented any actions on a record that didn't have a match.

The sed command looks worse than it might have. Brackets were used to mark the boundaries of each record. I couldn't use different characters like I could to substitute forward slashes. I did use indentation and newlines to show the boundaries better. Perhaps I should have annotated the program itself with comments. I worked it out in an interactive shell and cut and pasted it into the post. I did try to explain what I did however. Used subranges joined by the desired pattern. Having to escape characters does make things messy. If one learns to look past them, it doesn't look as bad. But I agree that regular expressions are easier to write then to read.

All I needed to do to match the second pattern was substitute the "code = code1". That's when I noticed my mistake.

Gawk is almost 5 times as large as sed. It works wonders for text databases in regular fields. I wouldn't have considered sed in that case.

Perl is 25 times the size of sed, not counting modules you might load. And the OP probably doesn't know it. Recommending learning an entire language to solve a particular problem sounds a bit like an RTFM response. Don't get me wrong. I'm not blasting Perl, but to someone who isn't a proficient perl programmer (and I'm not), Perl can look a lot like you describe sed, to a Python or Ruby programmer.

===

I confess that I could have taken more time to describe my strategy. The address range matches a record. Both of the subranges only match a part of a record with the desired line in them. As each line is read in, it is saved in the Hold register until the end of record when it is printed out.

An alternative method could have been to save each line on every record until the closing bracket. Then retrieve the multiline record from the Hold space and test for the pattern: '{.*name =name1.*}'.

Last edited by jschiwal; 10-11-2008 at 12:08 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with regex tbeehler Linux - Software 4 07-11-2008 10:05 AM
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 05:10 AM
Regex help Penguin of Wonder Programming 4 08-06-2007 07:04 AM
regex help siyisoy Programming 4 04-07-2006 05:32 AM
Regex Help cmfarley19 Programming 5 03-31-2005 10:13 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 01:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration