ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi,
I've been using Linux for many years, but my scripting knowledge is pretty much useless (I code using C and/or Java for most of my needs)
I've got a project where I'm being asked to convert a file from XML to CSV then process the CSV file, etc.
I've googled about awk, sed and perl but I thought before I give it much more time I'd ask for some advice to see if its worth my going down these routes.
Here's a sample of the XML file ;
<OrderMessage>
<Header>
<TerminalID>TEST123</TerminalID>
<ProductCodeType>PIP</ProductCodeType>
<OrderType>N</OrderType>
<AccountNumber>012345</AccountNumber>
<DateTimeSent>2007-09-12T09:24:48</DateTimeSent>
</Header>
<Order>
<Line>
<LineNumber>1</LineNumber>
<ProductCode>1003763</ProductCode>
<PackSize>1</PackSize>
<Quantity>1</Quantity>
<StickerPrice>0</StickerPrice>
</Line>
<Line>
<LineNumber>2</LineNumber>
<ProductCode>2551661</ProductCode>
<PackSize>1</PackSize>
<Quantity>1</Quantity>
<StickerPrice>0</StickerPrice>
</Line>
</Order>
</OrderMessage>
The only elements I'm interested in are;
[AccountNumber]
and then
[ProductCode],[Quantity]
[ProductCode],[Quantity]
...
If I can get it to output AccNo=[AccountNumber] then all the better (saves me another job)
All the scripts I've seen don't seem to allow for retreiving multiple elements within a section, so I've got a script to retreive the [ProductCode], or the [Quantity] fields, but I've not seen anything to retrieve [ProductCode],[Quantity]
So basically, can I do this via perl,awk or should I just code something in C and stop looking into scripts?
using awk the trick can be: to set a record separator to fit your needs. As I see <line> looks good (if you do not need the header part)
Use all the non-required chars as separator, so set it as: "</>\n"
next, you will have fields, $1 is linenumber, $2 its value, $3 linenumber again, $4 productcode ....
finally you can format the output as you like.
this is not tested. The first record will contain account number, you need to handle it differently. something like:
Code:
/AccountNumber/ { printf "....", ....; next }
{ printf "%s=%s, %s=%s, %s=%s\n", .....; } ## <- this is the line handling records without account nnumbers
Hi Markush and pan64,
Thanks for both the replies, I'll look into both of them, as I said my coding is pretty much non-existant in awk and perl, so I'll look for some examples and start there.
markush, yeah I noticed there was a programming section after I'd hit submit, I put it in here as other posts on a similar subject where in here as well, my bad.
First line creates a file with the AccNo wording I need.
Second line parses the file and strips out the accountnumber section
Third line puts a newline after the accountnumber
Fourth line parses and appends the line and quantity details
__________________________________
Happy with solution ... mark as SOLVED
If someone helps you, or you approve of what's posted, click the "Add to Reputation" button, on the left of the post.
Please note that simple-minded XML parsers are prone to breakage if the 'visual' format of the XML data changes. It is quite valid XML to put the entire file on one line, or to otherwise insert or remove whitespace in many ways. A proper XML parser will be indifferent to this kind of formatting. The following are all equivalent XML:
Code:
<AccountNumber>012345</AccountNumber>
Code:
<AccountNumber>
012345
</AccountNumber>
Code:
<AccountNumber>
012345
</AccountNumber>
Code:
<AccountNumber>
012345</AccountNumber>
Code:
<AccountNumber>
012345 </AccountNumber>
There can be even more severe differences. The possibility of these is one very good reason to use a standard-compliant tested XML parser module/library. Most common programming languages have bindings to at least a couple of different styles of XML parsers.
--- rod.
@theNbomr: actually whitespace inside a node is significant; unless you pass it a schema saying otherwise, the parser has to assume the node contains arbitrary text.
@theNbomr: actually whitespace inside a node is significant; unless you pass it a schema saying otherwise, the parser has to assume the node contains arbitrary text.
I stand corrected on that, then. However, I think it underscores the importance of a proper XML parser. If what you say is true, I think I will have to investigate how well some of my favorite standards do, in that respect.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.