LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-18-2012, 01:50 PM   #1
vicky007aggrwal
Member
 
Registered: Aug 2012
Posts: 95

Rep: Reputation: Disabled
extract words from a xml file


Can somebody please suggest how to extract the "name" attribute value from the
following XML file pattern.Main problem is that the below four line is actually a SINGLE
line in a XML

<XML>
<ild id =1 name=dd status=success ip=12.4.5.6>
<ild id =1 name=we status=success ip=12.4.5.6>
<ild id =1 name=fred status=success ip=12.4.5.6>
<ild id =1 name=gerd status=success ip=12.4.5.6>
</XML>

I was thinking to run For loop & then extract ,but then i realised that above said content is coming in a single line.that is the reason i am not able to grep "Name" attribute & extract its correponding value i.e dd value for example as mentioned in above xml file
 
Old 08-18-2012, 02:27 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
You cannot reliably use line/regex-oriented tools like sed/awk/shell-scripting on arbitrary xml, due to its flexible format and nested nature. You really need to use something that has a built-in xml parser, like xmlstarlet.

http://xmlstar.sourceforge.net/

If you would provide an actual example of valid xml, then I may be able to give you a solution. xmlstarlet pukes on what you provided above and refuses to work.

And please use ***[code][/code] tags*** around your code and data when you do, to preserve formatting and to improve readability. Please do not use quote tags, bolding, colors, or other fancy formatting.

Edit:
After changing the above to this:
Code:
<XML>
	<ild id="1" name="dd" status="success" ip="12.4.5.6"/>
	<ild id="1" name="we" status="success" ip="12.4.5.6"/>
	<ild id="1" name="fred" status="success" ip="12.4.5.6"/>
	<ild id="1" name="gerd" status="success" ip="12.4.5.6"/>
</XML>
I can now use xmlstarlet like this:
Code:
$ xmlstarlet sel -t -m '//ild' -v '@name' -n file.xml
dd
we
fred
gerd

Last edited by David the H.; 08-18-2012 at 02:46 PM. Reason: as stated
 
Old 08-18-2012, 02:29 PM   #3
gregAstley
LQ Newbie
 
Registered: Aug 2012
Distribution: ubuntu 11.10
Posts: 27

Rep: Reputation: 4
Quote:
Originally Posted by vicky007aggrwal View Post
Can somebody please suggest how to extract the "name" attribute value from the
following XML file pattern.Main problem is that the below four line is actually a SINGLE
line in a XML

<XML>
<ild id =1 name=dd status=success ip=12.4.5.6>
<ild id =1 name=we status=success ip=12.4.5.6>
<ild id =1 name=fred status=success ip=12.4.5.6>
<ild id =1 name=gerd status=success ip=12.4.5.6>
</XML>

I was thinking to run For loop & then extract ,but then i realised that above said content is coming in a single line.that is the reason i am not able to grep "Name" attribute & extract its correponding value i.e dd value for example as mentioned in above xml file
If your first obstacle to coding this one up is the fact that they are not on separate lines then (don't know which editor you prefer) for example in vim I would select everything then do the following regular expression replacement: s/></>\r</g (i.e. swapping "><" with ">", a return, followed by "<"). Then save the result and code up the rest

Last edited by gregAstley; 08-18-2012 at 02:36 PM.
 
Old 08-18-2012, 02:43 PM   #4
byannoni
Member
 
Registered: Aug 2012
Location: /home/byannoni
Distribution: Arch
Posts: 128

Rep: Reputation: 36
This works assuming the input is a single line like OP said:
Code:
awk -v RS='>\\s*<' -F'="?' '$2 ~ / name/ { sub(/"? .*/, "", $3); print $3 }'

Last edited by byannoni; 08-18-2012 at 04:08 PM. Reason: Added quote tolerance. Thanks for pointing that out, David the H.
 
Old 08-18-2012, 03:30 PM   #5
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,326

Rep: Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919
Code:
grep -o name=[a-z]* vicky.xml
thx david... for whatever reason my previous edit accidentally removed the -o flag ?

Last edited by schneidz; 08-18-2012 at 03:59 PM.
 
Old 08-18-2012, 03:53 PM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Small correction:

Code:
grep -o "name=[a-z]*" vicky.xml
Use -o to output only the matches, and be sure to quote the expression, else the shell may attempt to expand the globbing characters within the pattern first.

But even this still gives you the entire "name=value" expression, and must be further filtered to strip it down to the value only. It also assumes that you want to grab all "name" attributes in the file.

Also, as I mentioned, the sample code given above does not conform to xml standards, and so shouldn't be taken as a true input example. I believe that attribute values must always be double-quoted in real xml, for example.

And again, any solution that involves tools like grep/sed/awk must depend on the xml file being cleanly and predictably formatted. Only a true xml parser can be trusted to always give clean results.

Last edited by David the H.; 08-18-2012 at 03:58 PM. Reason: expansions
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
extract xml from a large text file then write it to a new file richiep Linux - Software 3 10-28-2010 10:15 PM
Prompt the user for a file to open, extract the XML and write to another text file. richiep Linux - Newbie 7 10-22-2010 04:34 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration