Review your favorite Linux distribution.
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 03-27-2012, 03:39 AM   #1
LQ Newbie
Registered: Mar 2012
Posts: 1

Rep: Reputation: Disabled
Extarct tags with multiline values from XML file using sed/Awk


I have some XML file which holds data-value pairs(basically, a Java properties file in XML) as shown below.
This file contains both single line tags and multiline tags.

<entry key="KEY1"> tag1 value </entry>
<entry key="KEY2" > hello
world. This is multiline tag example.
blahh blah blah...

I want to extract the tag value by passing tag the name from bash script.
Could somebody give me some pointers to extract multiline value of a tag ?

Old 03-27-2012, 04:57 AM   #2
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,529

Rep: Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896Reputation: 2896
This might get you going:
awk '{print "|"$0"|"}' RS="[<>\n]+" file
Generally though your probably better off with Perl or Ruby as they have xml parsers which they can use.
Old 03-27-2012, 05:37 AM   #3
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
XMLStarlet has been recommended on LQ. I haven't needed to use it yet so cannot say how good it is etc.
Old 03-27-2012, 10:18 AM   #4
David the H.
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960Reputation: 1960
xml and html data structures are (generally) free-form in terms of whitespace and can contain nested values, both of which are difficult-to-impossible for regular expression and line-based programs like sed or awk to parse reliably.

So unless your extraction requirements are trivial and the input is guaranteed to be well-formed and uniform, you're much better off working with tools specifically designed for those languages, as suggested above.

xmlstarlet is probably a good place to start. Like catkin, I don't know much about it personally, but it has a good set of documentation here:

Also, please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk or sed to use CSV as input and XML as template and output to a single file bridrod Linux - Newbie 6 03-13-2012 07:00 PM
iter around same tags in xml with awk frambau Programming 15 02-10-2012 06:28 AM
how to modify xml file using sed/awk akhand jyoti Linux - Newbie 3 11-29-2011 02:47 PM
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 11:54 AM
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 02:57 PM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:36 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration