LinuxQuestions.org - Using awk or sed to Parse XML specific attributes and nodes

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Using awk or sed to Parse XML specific attributes and nodes (https://www.linuxquestions.org/questions/linux-newbie-8/using-awk-or-sed-to-parse-xml-specific-attributes-and-nodes-4175433755/)

Using awk or sed to Parse XML specific attributes and nodes

Hola All,

So I am relatively familiar with linux, but recently started learning how to use awk and sed. I have a standard xml file that I receive and need to parse and put in a file a specific attribute in the file based on a node saying if it was a pass or fail.

-----------------------Example XML-----------------------

<cdf:rule-result version="GEN005390" time="2012-10-21T16:26:56" idref="SV-37709r1_rule" weight="10.0" severity="medium">
<cdf:result>pass</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000225</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005450" time="2012-10-21T16:26:56" idref="SV-37811r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://cce.mitre.org">CCE-4260-6</cdf:ident>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000136</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005501" time="2012-10-21T16:26:56" idref="SV-37820r1_rule" weight="10.0" severity="medium">
<cdf:result>pass</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-001436</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005505" time="2012-10-21T16:26:56" idref="SV-37824r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://cce.mitre.org">CCE-14491-5</cdf:ident>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005507" time="2012-10-21T16:26:56" idref="SV-37826r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-001453</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005510" time="2012-10-21T16:26:56" idref="SV-37828r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident>
</cdf:rule-result>

-----------------------
Out of the above example.xml, I need the text from the attribute version=, (GEN******), placed into a text file, if it's corresponding node <cdf:result> is not equal to pass.

With this example, I can get all GEN****** and results of the second node into a file.

awk '/version="GEN/ {print substr($0,RSTART+39,RLENGTH+9)} /cdf:result/ {print substr($0,RSTART+31)}' XCCDF-Results.xml >> temp.txt

Thanks,
Dickie

Perhaps something like this...

Code:

awk -F\" '/<cdf:rule-result version=/ { x=$2 }

    /<cdf:result>pass<\/cdf:result>/ { x="" }

/<cdf:result>fail<\/cdf:result>/ && x!="" { print x ; x="" }

' testfile.xml



GEN005450

GEN005505

GEN005507

GEN005510

Happy with ur solution... then tick "yes" and mark as Solved!

dru8274 thanks for your quick reply! the information you provided was exactly what I was looking for.

D1ck1e

Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

Also, when giving us data to work with, please make sure it's complete. I couldn't do any testing on what you gave me until I figured out how to get it into proper xml, with a defined namespace.

Anyway, line and regex-based tools like sed and awk are not well designed for nested, tag-structured languages like xml/html. You should only use them when you can guarantee that the file format is unvarying.

It's much better in the long run to use a tool with a dedicated xml parser, like xmlstarlet.

http://xmlstar.sourceforge.net/

I'm still kind of a beginner at this, but I was able to extract the kind of data you wanted with these commands:

Code:

$ xmlstarlet sel -T -t -m '//cdf:rule-result' -v 'concat(@version," ",cdf:result)' -n file.xml

GEN005390 pass

GEN005450 fail

GEN005501 pass

GEN005505 fail

GEN005507 fail

GEN005510 fail



$ xmlstarlet sel -T -t -m '//cdf:rule-result[cdf:result="pass"]' -v 'concat(@version," ",cdf:result)' -n file.xml

GEN005390 pass

GEN005501 pass



$ xmlstarlet sel -T -t -m '//cdf:rule-result[not(cdf:result="pass")]' -v 'concat(@version," ",cdf:result)' -n file.xml

GEN005450 fail

GEN005505 fail

GEN005507 fail

GEN005510 fail

Someone more experienced in xpath manipulation could doubtlessly do much more with it.