Using awk or sed to Parse XML specific attributes and nodes
Hola All,
So I am relatively familiar with linux, but recently started learning how to use awk and sed. I have a standard xml file that I receive and need to parse and put in a file a specific attribute in the file based on a node saying if it was a pass or fail. -----------------------Example XML----------------------- <cdf:rule-result version="GEN005390" time="2012-10-21T16:26:56" idref="SV-37709r1_rule" weight="10.0" severity="medium"> <cdf:result>pass</cdf:result> <cdf:ident system="http://iase.disa.mil/cci">CCI-000225</cdf:ident> </cdf:rule-result> <cdf:rule-result version="GEN005450" time="2012-10-21T16:26:56" idref="SV-37811r1_rule" weight="10.0" severity="medium"> <cdf:result>fail</cdf:result> <cdf:ident system="http://cce.mitre.org">CCE-4260-6</cdf:ident> <cdf:ident system="http://iase.disa.mil/cci">CCI-000136</cdf:ident> </cdf:rule-result> <cdf:rule-result version="GEN005501" time="2012-10-21T16:26:56" idref="SV-37820r1_rule" weight="10.0" severity="medium"> <cdf:result>pass</cdf:result> <cdf:ident system="http://iase.disa.mil/cci">CCI-001436</cdf:ident> </cdf:rule-result> <cdf:rule-result version="GEN005505" time="2012-10-21T16:26:56" idref="SV-37824r1_rule" weight="10.0" severity="medium"> <cdf:result>fail</cdf:result> <cdf:ident system="http://cce.mitre.org">CCE-14491-5</cdf:ident> <cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident> </cdf:rule-result> <cdf:rule-result version="GEN005507" time="2012-10-21T16:26:56" idref="SV-37826r1_rule" weight="10.0" severity="medium"> <cdf:result>fail</cdf:result> <cdf:ident system="http://iase.disa.mil/cci">CCI-001453</cdf:ident> </cdf:rule-result> <cdf:rule-result version="GEN005510" time="2012-10-21T16:26:56" idref="SV-37828r1_rule" weight="10.0" severity="medium"> <cdf:result>fail</cdf:result> <cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident> </cdf:rule-result> ----------------------- Out of the above example.xml, I need the text from the attribute version=, (GEN******), placed into a text file, if it's corresponding node <cdf:result> is not equal to pass. With this example, I can get all GEN****** and results of the second node into a file. awk '/version="GEN/ {print substr($0,RSTART+39,RLENGTH+9)} /cdf:result/ {print substr($0,RSTART+31)}' XCCDF-Results.xml >> temp.txt Thanks, Dickie |
Perhaps something like this...
Code:
awk -F\" '/<cdf:rule-result version=/ { x=$2 } |
dru8274 thanks for your quick reply! the information you provided was exactly what I was looking for.
D1ck1e |
Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.
Also, when giving us data to work with, please make sure it's complete. I couldn't do any testing on what you gave me until I figured out how to get it into proper xml, with a defined namespace. Anyway, line and regex-based tools like sed and awk are not well designed for nested, tag-structured languages like xml/html. You should only use them when you can guarantee that the file format is unvarying. It's much better in the long run to use a tool with a dedicated xml parser, like xmlstarlet. http://xmlstar.sourceforge.net/ I'm still kind of a beginner at this, but I was able to extract the kind of data you wanted with these commands: Code:
$ xmlstarlet sel -T -t -m '//cdf:rule-result' -v 'concat(@version," ",cdf:result)' -n file.xml |
All times are GMT -5. The time now is 10:07 PM. |