Extarct tags with multiline values from XML file using sed/Awk
Hi,
I have some XML file which holds data-value pairs(basically, a Java properties file in XML) as shown below. This file contains both single line tags and multiline tags. <entry key="KEY1"> tag1 value </entry> <entry key="KEY2" > hello world. This is multiline tag example. blahh blah blah... </entry> I want to extract the tag value by passing tag the name from bash script. Could somebody give me some pointers to extract multiline value of a tag ? Thanks, gbms |
This might get you going:
Code:
awk '{print "|"$0"|"}' RS="[<>\n]+" file |
XMLStarlet has been recommended on LQ. I haven't needed to use it yet so cannot say how good it is etc.
|
xml and html data structures are (generally) free-form in terms of whitespace and can contain nested values, both of which are difficult-to-impossible for regular expression and line-based programs like sed or awk to parse reliably.
So unless your extraction requirements are trivial and the input is guaranteed to be well-formed and uniform, you're much better off working with tools specifically designed for those languages, as suggested above. xmlstarlet is probably a good place to start. Like catkin, I don't know much about it personally, but it has a good set of documentation here: http://xmlstar.sourceforge.net/docs.php Also, please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting. |
All times are GMT -5. The time now is 07:48 PM. |