LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-23-2012, 09:25 PM   #1
D1ck1e
LQ Newbie
 
Registered: Oct 2012
Posts: 2

Rep: Reputation: Disabled
Using awk or sed to Parse XML specific attributes and nodes


Hola All,

So I am relatively familiar with linux, but recently started learning how to use awk and sed. I have a standard xml file that I receive and need to parse and put in a file a specific attribute in the file based on a node saying if it was a pass or fail.

-----------------------Example XML-----------------------

<cdf:rule-result version="GEN005390" time="2012-10-21T16:26:56" idref="SV-37709r1_rule" weight="10.0" severity="medium">
<cdf:result>pass</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000225</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005450" time="2012-10-21T16:26:56" idref="SV-37811r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://cce.mitre.org">CCE-4260-6</cdf:ident>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000136</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005501" time="2012-10-21T16:26:56" idref="SV-37820r1_rule" weight="10.0" severity="medium">
<cdf:result>pass</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-001436</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005505" time="2012-10-21T16:26:56" idref="SV-37824r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://cce.mitre.org">CCE-14491-5</cdf:ident>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005507" time="2012-10-21T16:26:56" idref="SV-37826r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-001453</cdf:ident>
</cdf:rule-result>
<cdf:rule-result version="GEN005510" time="2012-10-21T16:26:56" idref="SV-37828r1_rule" weight="10.0" severity="medium">
<cdf:result>fail</cdf:result>
<cdf:ident system="http://iase.disa.mil/cci">CCI-000068</cdf:ident>
</cdf:rule-result>

-----------------------
Out of the above example.xml, I need the text from the attribute version=, (GEN******), placed into a text file, if it's corresponding node <cdf:result> is not equal to pass.

With this example, I can get all GEN****** and results of the second node into a file.

awk '/version="GEN/ {print substr($0,RSTART+39,RLENGTH+9)} /cdf:result/ {print substr($0,RSTART+31)}' XCCDF-Results.xml >> temp.txt

Thanks,
Dickie
 
Old 10-24-2012, 06:20 AM   #2
dru8274
Member
 
Registered: Oct 2011
Location: New Zealand
Distribution: Debian
Posts: 105

Rep: Reputation: 37
Perhaps something like this...
Code:
awk -F\" '/<cdf:rule-result version=/ { x=$2 }
     /<cdf:result>pass<\/cdf:result>/ { x="" }
/<cdf:result>fail<\/cdf:result>/ && x!="" { print x ; x="" }
' testfile.xml

GEN005450
GEN005505
GEN005507
GEN005510
Happy with ur solution... then tick "yes" and mark as Solved!
 
1 members found this post helpful.
Old 10-24-2012, 07:07 AM   #3
D1ck1e
LQ Newbie
 
Registered: Oct 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
dru8274 thanks for your quick reply! the information you provided was exactly what I was looking for.

D1ck1e
 
Old 10-25-2012, 04:55 PM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

Also, when giving us data to work with, please make sure it's complete. I couldn't do any testing on what you gave me until I figured out how to get it into proper xml, with a defined namespace.

Anyway, line and regex-based tools like sed and awk are not well designed for nested, tag-structured languages like xml/html. You should only use them when you can guarantee that the file format is unvarying.

It's much better in the long run to use a tool with a dedicated xml parser, like xmlstarlet.

http://xmlstar.sourceforge.net/

I'm still kind of a beginner at this, but I was able to extract the kind of data you wanted with these commands:

Code:
$ xmlstarlet sel -T -t -m '//cdf:rule-result' -v 'concat(@version," ",cdf:result)' -n file.xml
GEN005390 pass
GEN005450 fail
GEN005501 pass
GEN005505 fail
GEN005507 fail
GEN005510 fail

$ xmlstarlet sel -T -t -m '//cdf:rule-result[cdf:result="pass"]' -v 'concat(@version," ",cdf:result)' -n file.xml
GEN005390 pass
GEN005501 pass

$ xmlstarlet sel -T -t -m '//cdf:rule-result[not(cdf:result="pass")]' -v 'concat(@version," ",cdf:result)' -n file.xml
GEN005450 fail
GEN005505 fail
GEN005507 fail
GEN005510 fail
Someone more experienced in xpath manipulation could doubtlessly do much more with it.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Parse XML tags with attribute with SED adamzuber Programming 4 04-09-2012 11:37 PM
Please Help with AWK code to parse XML messages JamesOwen Linux - Newbie 14 02-08-2012 03:29 PM
Modifying Specific Child Nodes In XML using Shell Script senthilmuthiah Linux - Newbie 1 04-20-2009 05:38 AM
I need to parse a word: awk or sed? mehesque Programming 5 07-27-2004 05:23 PM


All times are GMT -5. The time now is 07:59 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration