LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-23-2007, 12:46 AM   #1
naren_0101bits
Member
 
Registered: Jul 2004
Location: Guntur
Posts: 44

Rep: Reputation: 15
Unhappy Help needed in writing awk script for xml source


Hi, i am not able to get an approach for converting xml file to flat file using awk programming. Can anyone help me out.

The input xml is like this:

<outer>
<field1>one</field1>
<field2>two</field2>
<field3>three<Error Code=777 Description=12345/></field3>
<field4>four</field4>
</outer>



Expected output is like this:

field1 field2 field3 field4 Code Description
one two three four 777 12345

Thanks in advance,
Naren
 
Old 09-23-2007, 04:30 AM   #2
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 38
Code:
 awk 'FNR==1{printf "%-6s|%-6s|%-6s|%-6s|%-6s|%s\n","field1","field2", 
"field3","field4","code","description"}
/field1/{printf "%-6s|",$3}
/field2/{printf "%-6s|",$3}
/field3/{printf "%-6s|",$3;code=$6;desc=$8}
/field4/{printf "%-6s|%-6s|%s\n",$3,code,desc}' FS="</|[/ <>=]" filename

Example:

Code:
zsh 4.3.4% cat file
<outer>
<field1>one</field1>
<field2>two</field2>
<field3>three<Error Code=777 Description=12345/></field3>
<field4>four</field4>
</outer>
<outer1>
<field1>one1</field1>
<field2>two1</field2>
<field3>three1<Error Code=7770 Description=1234/></field3>
<field4>four1</field4>
</outer1>
<outer2>
<field1>one2</field1>
<field2>two2</field2>
<field3>three2<Error Code=7 Description=45/></field3>
<field4>four2</field4>
</outer2>
zsh 4.3.4% awk 'FNR==1{printf "%-6s|%-6s|%-6s|%-6s|%-6s|%s\n","field1","field2", 
"field3","field4","code","description"}
/field1/{printf "%-6s|",$3}
/field2/{printf "%-6s|",$3}
/field3/{printf "%-6s|",$3;code=$6;desc=$8}
/field4/{printf "%-6s|%-6s|%s\n",$3,code,desc}' FS="</|[/ <>=]" file
field1|field2|field3|field4|code  |description
one   |two   |three |four  |777   |12345
one1  |two1  |three1|four1 |7770  |1234
one2  |two2  |three2|four2 |7     |45
Note that this is only an example code and
it's quite possible that it doesn't work with your real data (you may have to adjust the FS, use sub/gsub etc.).
If you post a real sample from you xml,
it will be better.

P.S. And of course, if you install the XML gawk (xgawk) you won't need this "ugly" code

Last edited by radoulov; 09-23-2007 at 04:43 AM. Reason: clarifying ...
 
Old 09-24-2007, 12:12 AM   #3
naren_0101bits
Member
 
Registered: Jul 2004
Location: Guntur
Posts: 44

Original Poster
Rep: Reputation: 15
Is it possible to do the same with out hardcoding the field values field1, field2, field3, code, description except the outer tag.
what if there is one more level of inner tags like wise
<outer>
<field1>one</field1>
<field2>two</field2>
<field3>three<Error Code=777 Description=12345/></field3>
<field4>four</field4>
<inner>
<f1>1</f1>
<f2>2</f2>
</inner>

</outer>
 
Old 09-24-2007, 12:21 AM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
don't waste time and keep changing requirements, show the whole input sample xml file you have if possible. describe clearly what you want as output.
 
Old 09-24-2007, 12:36 AM   #5
naren_0101bits
Member
 
Registered: Jul 2004
Location: Guntur
Posts: 44

Original Poster
Rep: Reputation: 15
<AMP>
<EnrID>ENR1</EnrID>
<SubID>SUB135</SubID>
<DependentID>0011</DependentID>
<StfID>Sysuser<Error Code="233" Description="Staff ID does not exist"/></StfID>
<PRoc>
<ProcCType>ICU</ProcCType>
<ProcCode>31</ProcCode>
<Rank>SEC</Rank>
<Active></Active>
<InActive>12319999</InActive>
</PRoc>
</AMP>


This is my input sample file and expected output is

EnrID|SubID|DependentID|StfID|Code|Description|PRocCType|ProcCode|Rank|Active|InActive
ENR1|SUB135|0011|Sysuser|233|Staff ID does not exist|ICU|31|SEC|-|12319999


Except AMP tag, no other tag should be hardcoded.
 
Old 09-26-2007, 07:48 AM   #6
naren_0101bits
Member
 
Registered: Jul 2004
Location: Guntur
Posts: 44

Original Poster
Rep: Reputation: 15
Refresh
Can anyone suggest me some approach for my problem
 
Old 09-26-2007, 08:00 AM   #7
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
You could use the program xmlstarlet.

You could also use Perl and the XML::Parser module.
 
Old 10-02-2007, 08:47 PM   #8
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Bit late, but here's a python solution:

Code:
$ cat a
<AMP>
<EnrID>ENR1</EnrID>
<SubID>SUB135</SubID>
<DependentID>0011</DependentID>
<StfID>Sysuser<Error Code="233" Description="Staff ID does not exist"/></StfID>
<PRoc>
<ProcCType>ICU</ProcCType>
<ProcCode>31</ProcCode>
<Rank>SEC</Rank>
<Active></Active>
<InActive>12319999</InActive>
</PRoc>
</AMP>

$ cat a|python parser.py
EnrID|SubID|DependentID|StfID|Code|Description|ProcCType|ProcCode|Rank|Active|InActive
ENR1|SUB135|0011|Sysuser|233|Staff ID does not exist|ICU|31|SEC|-|12319999
here's the script:
Code:
import sys
from xml.etree.ElementTree import ElementTree

d = {}
tags = []
tree = ElementTree(file=sys.stdin)

for element in tree.getiterator():
        if not element.attrib and element.text=='\n':
                continue
        if element.tag=="Error":
                d.update(element.attrib)
                tags += element.attrib.keys()
        else:   
                element.text = element.text or '-'
                d[element.tag] = element.text.strip()
                tags.append(element.tag)

print '|'.join(tags)
print '|'.join(d[tag] for tag in tags)
Hope this helps.

Last edited by angrybanana; 10-02-2007 at 08:56 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 11:54 AM
awk question - parsing xml file epoo Programming 7 01-24-2007 02:13 PM
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 02:57 PM
writing awk script files bigmark Linux - Software 1 10-19-2005 10:33 AM
Help needed in writing Awk Scripts.. TheDarktrooper Red Hat 1 05-04-2004 06:49 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration