LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-06-2012, 08:58 AM   #16
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037

You appear to be running into exactly the problem I warned you about. Regular expressions have a very hard time dealing with nested content. The kind used in sed, at least*, has no way to "look ahead" to precisely determine which ending tag matches which starting one.

(*The perl-based regex flavor might be able to, since it has look-ahead and look-behind features built-into it, but that's not supported by sed).


xmlstarlet, however, has one more trick up its sleeve for you. It can convert xml into a line-based format called pyx, which is specifically designed to make parsing data easier with tools like grep and sed.

Code:
xmlstarlet fo -H -Q -R file.html | xmlstarlet pyx | sed -n '/^Aclass entry/,/^)div/ { /^-/ s///p }'
As I hope you can see, it usually makes your sed expressions much cleaner and easier, although for maximum benefit you do need to know how to do multi-line editing.

There can still be a few issues with matching nested tags, but since they are now cleanly spread out over multiple lines rather than potentially squashed up, it becomes more a question of setting up proper address ranges than of building complex regexes.

It also appears to be a bit more robust than using the xml parser directly. It doesn't hang up completely if there are minor syntax errors.


Here are a few useful sed references. See the first one in particular for more on how to use its multi-line features:
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed - replace alternate occurrences (on different lines) gazzatav Programming 5 04-16-2011 11:46 AM
[SOLVED] sed remove all occurrences in a string hattori.hanzo Linux - Newbie 5 11-22-2010 04:46 AM
command grep issue: how to get occurrences of an pattern look like "cool.a_string" coolloo_djack Linux - General 4 03-13-2010 09:27 AM
[SOLVED] Need sed help: s/ command won't replace two occurrences of pattern on same line GrapefruiTgirl Programming 7 12-16-2009 02:08 AM
last pattern with sed? xpto09 Linux - Newbie 6 10-04-2007 08:01 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:47 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration