ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Note that this is only an example code and
it's quite possible that it doesn't work with your real data (you may have to adjust the FS, use sub/gsub etc.).
If you post a real sample from you xml,
it will be better.
P.S. And of course, if you install the XML gawk (xgawk) you won't need this "ugly" code
Last edited by radoulov; 09-23-2007 at 04:43 AM.
Reason: clarifying ...
Is it possible to do the same with out hardcoding the field values field1, field2, field3, code, description except the outer tag.
what if there is one more level of inner tags like wise
<outer>
<field1>one</field1>
<field2>two</field2>
<field3>three<Error Code=777 Description=12345/></field3>
<field4>four</field4> <inner>
<f1>1</f1>
<f2>2</f2>
</inner>
</outer>
<AMP>
<EnrID>ENR1</EnrID>
<SubID>SUB135</SubID>
<DependentID>0011</DependentID>
<StfID>Sysuser<Error Code="233" Description="Staff ID does not exist"/></StfID>
<PRoc>
<ProcCType>ICU</ProcCType>
<ProcCode>31</ProcCode>
<Rank>SEC</Rank>
<Active></Active>
<InActive>12319999</InActive>
</PRoc>
</AMP>
This is my input sample file and expected output is
EnrID|SubID|DependentID|StfID|Code|Description|PRocCType|ProcCode|Rank|Active|InActive
ENR1|SUB135|0011|Sysuser|233|Staff ID does not exist|ICU|31|SEC|-|12319999
$ cat a
<AMP>
<EnrID>ENR1</EnrID>
<SubID>SUB135</SubID>
<DependentID>0011</DependentID>
<StfID>Sysuser<Error Code="233" Description="Staff ID does not exist"/></StfID>
<PRoc>
<ProcCType>ICU</ProcCType>
<ProcCode>31</ProcCode>
<Rank>SEC</Rank>
<Active></Active>
<InActive>12319999</InActive>
</PRoc>
</AMP>
$ cat a|python parser.py
EnrID|SubID|DependentID|StfID|Code|Description|ProcCType|ProcCode|Rank|Active|InActive
ENR1|SUB135|0011|Sysuser|233|Staff ID does not exist|ICU|31|SEC|-|12319999
here's the script:
Code:
import sys
from xml.etree.ElementTree import ElementTree
d = {}
tags = []
tree = ElementTree(file=sys.stdin)
for element in tree.getiterator():
if not element.attrib and element.text=='\n':
continue
if element.tag=="Error":
d.update(element.attrib)
tags += element.attrib.keys()
else:
element.text = element.text or '-'
d[element.tag] = element.text.strip()
tags.append(element.tag)
print '|'.join(tags)
print '|'.join(d[tag] for tag in tags)
Hope this helps.
Last edited by angrybanana; 10-02-2007 at 08:56 PM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.