LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Using awk or sed to Parse XML specific attributes and nodes (https://www.linuxquestions.org/questions/linux-newbie-8/using-awk-or-sed-to-parse-xml-specific-attributes-and-nodes-4175433755/)

D1ck1e 10-23-2012 08:25 PM

Using awk or sed to Parse XML specific attributes and nodes
 
Hola All,

So I am relatively familiar with linux, but recently started learning how to use awk and sed. I have a standard xml file that I receive and need to parse and put in a file a specific attribute in the file based on a node saying if it was a pass or fail.

-----------------------Example XML-----------------------

<cdf:rule-result version="GEN005390" time="2012-10-21T16:26:56" idref="SV-37709r1_rule" weight="10.0" severity="medium">
<cdf:result>pass</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000225</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005450" time="2012-10-21T16:26:56" idref="SV-37811r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://cce.mitre.org">CCE-4260-6</cdf:ident>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000136</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005501" time="2012-10-21T16:26:56" idref="SV-37820r1_rule" weight="10.0" severity="medium">
<cdf:result>pass</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-001436</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005505" time="2012-10-21T16:26:56" idref="SV-37824r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://cce.mitre.org">CCE-14491-5</cdf:ident>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005507" time="2012-10-21T16:26:56" idref="SV-37826r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-001453</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005510" time="2012-10-21T16:26:56" idref="SV-37828r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident>
</cdf:rule-result>

-----------------------
Out of the above example.xml, I need the text from the attribute version=, (GEN******), placed into a text file, if it's corresponding node <cdf:result> is not equal to pass.

With this example, I can get all GEN****** and results of the second node into a file.

awk '/version="GEN/ {print substr($0,RSTART+39,RLENGTH+9)} /cdf:result/ {print substr($0,RSTART+31)}' XCCDF-Results.xml >> temp.txt

Thanks,
Dickie

dru8274 10-24-2012 05:20 AM

Perhaps something like this...
Code:

awk -F\" '/<cdf:rule-result version=/ { x=$2 }
    /<cdf:result>pass<\/cdf:result>/ { x="" }
/<cdf:result>fail<\/cdf:result>/ && x!="" { print x ; x="" }
' testfile.xml

GEN005450
GEN005505
GEN005507
GEN005510

Happy with ur solution... then tick "yes" and mark as Solved!

D1ck1e 10-24-2012 06:07 AM

dru8274 thanks for your quick reply! the information you provided was exactly what I was looking for.

D1ck1e

David the H. 10-25-2012 03:55 PM

Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

Also, when giving us data to work with, please make sure it's complete. I couldn't do any testing on what you gave me until I figured out how to get it into proper xml, with a defined namespace.

Anyway, line and regex-based tools like sed and awk are not well designed for nested, tag-structured languages like xml/html. You should only use them when you can guarantee that the file format is unvarying.

It's much better in the long run to use a tool with a dedicated xml parser, like xmlstarlet.

http://xmlstar.sourceforge.net/

I'm still kind of a beginner at this, but I was able to extract the kind of data you wanted with these commands:

Code:

$ xmlstarlet sel -T -t -m '//cdf:rule-result' -v 'concat(@version," ",cdf:result)' -n file.xml
GEN005390 pass
GEN005450 fail
GEN005501 pass
GEN005505 fail
GEN005507 fail
GEN005510 fail

$ xmlstarlet sel -T -t -m '//cdf:rule-result[cdf:result="pass"]' -v 'concat(@version," ",cdf:result)' -n file.xml
GEN005390 pass
GEN005501 pass

$ xmlstarlet sel -T -t -m '//cdf:rule-result[not(cdf:result="pass")]' -v 'concat(@version," ",cdf:result)' -n file.xml
GEN005450 fail
GEN005505 fail
GEN005507 fail
GEN005510 fail

Someone more experienced in xpath manipulation could doubtlessly do much more with it.


All times are GMT -5. The time now is 06:31 PM.