LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   extract words from a xml file (https://www.linuxquestions.org/questions/linux-newbie-8/extract-words-from-a-xml-file-4175422789/)

vicky007aggrwal 08-18-2012 12:50 PM

extract words from a xml file
 
Can somebody please suggest how to extract the "name" attribute value from the
following XML file pattern.Main problem is that the below four line is actually a SINGLE
line in a XML

<XML>
<ild id =1 name=dd status=success ip=12.4.5.6>
<ild id =1 name=we status=success ip=12.4.5.6>
<ild id =1 name=fred status=success ip=12.4.5.6>
<ild id =1 name=gerd status=success ip=12.4.5.6>
</XML>

I was thinking to run For loop & then extract ,but then i realised that above said content is coming in a single line.that is the reason i am not able to grep "Name" attribute & extract its correponding value i.e dd value for example as mentioned in above xml file

David the H. 08-18-2012 01:27 PM

You cannot reliably use line/regex-oriented tools like sed/awk/shell-scripting on arbitrary xml, due to its flexible format and nested nature. You really need to use something that has a built-in xml parser, like xmlstarlet.

http://xmlstar.sourceforge.net/

If you would provide an actual example of valid xml, then I may be able to give you a solution. xmlstarlet pukes on what you provided above and refuses to work.

And please use ***[code][/code] tags*** around your code and data when you do, to preserve formatting and to improve readability. Please do not use quote tags, bolding, colors, or other fancy formatting.

Edit:
After changing the above to this:
Code:

<XML>
        <ild id="1" name="dd" status="success" ip="12.4.5.6"/>
        <ild id="1" name="we" status="success" ip="12.4.5.6"/>
        <ild id="1" name="fred" status="success" ip="12.4.5.6"/>
        <ild id="1" name="gerd" status="success" ip="12.4.5.6"/>
</XML>

I can now use xmlstarlet like this:
Code:

$ xmlstarlet sel -t -m '//ild' -v '@name' -n file.xml
dd
we
fred
gerd


gregAstley 08-18-2012 01:29 PM

Quote:

Originally Posted by vicky007aggrwal (Post 4757737)
Can somebody please suggest how to extract the "name" attribute value from the
following XML file pattern.Main problem is that the below four line is actually a SINGLE
line in a XML

<XML>
<ild id =1 name=dd status=success ip=12.4.5.6>
<ild id =1 name=we status=success ip=12.4.5.6>
<ild id =1 name=fred status=success ip=12.4.5.6>
<ild id =1 name=gerd status=success ip=12.4.5.6>
</XML>

I was thinking to run For loop & then extract ,but then i realised that above said content is coming in a single line.that is the reason i am not able to grep "Name" attribute & extract its correponding value i.e dd value for example as mentioned in above xml file

If your first obstacle to coding this one up is the fact that they are not on separate lines then (don't know which editor you prefer) for example in vim I would select everything then do the following regular expression replacement: s/></>\r</g (i.e. swapping "><" with ">", a return, followed by "<"). Then save the result and code up the rest

byannoni 08-18-2012 01:43 PM

This works assuming the input is a single line like OP said:
Code:

awk -v RS='>\\s*<' -F'="?' '$2 ~ / name/ { sub(/"? .*/, "", $3); print $3 }'

schneidz 08-18-2012 02:30 PM

Code:

grep -o name=[a-z]* vicky.xml
thx david... for whatever reason my previous edit accidentally removed the -o flag ?

David the H. 08-18-2012 02:53 PM

Small correction:

Code:

grep -o "name=[a-z]*" vicky.xml
Use -o to output only the matches, and be sure to quote the expression, else the shell may attempt to expand the globbing characters within the pattern first.

But even this still gives you the entire "name=value" expression, and must be further filtered to strip it down to the value only. It also assumes that you want to grab all "name" attributes in the file.

Also, as I mentioned, the sample code given above does not conform to xml standards, and so shouldn't be taken as a true input example. I believe that attribute values must always be double-quoted in real xml, for example.

And again, any solution that involves tools like grep/sed/awk must depend on the xml file being cleanly and predictably formatted. Only a true xml parser can be trusted to always give clean results.


All times are GMT -5. The time now is 02:11 PM.