LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-24-2010, 01:11 PM   #1
vxc69
Member
 
Registered: Jul 2004
Distribution: Ubuntu
Posts: 387

Rep: Reputation: 33
Quick question on XML parsing.


Hello,


When I parse a XML file, should I rely on the order of elements?

For example say we have:

<book>
<author></author>
<title></title>
</book>

Should I rely on the above order?

Would the following still be valid:
<book>
<title></title>
<author></author>
<book>

I'm trying to find out if a well formed XML document should have an ordered structure, or if it's still valid XML if it has no order.

I think I'm doing it wrong if I rely on the order, because order shouldn't be important, wouldn't make sense if it did, right?


Thanks

Last edited by vxc69; 02-24-2010 at 01:14 PM.
 
Old 02-24-2010, 01:32 PM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by vxc69 View Post
Hello,


When I parse a XML file, should I rely on the order of elements?

For example say we have:

<book>
<author></author>
<title></title>
</book>

Should I rely on the above order?

Would the following still be valid:
<book>
<title></title>
<author></author>
<book>

I'm trying to find out if a well formed XML document should have an ordered structure, or if it's still valid XML if it has no order.

I think I'm doing it wrong if I rely on the order, because order shouldn't be important, wouldn't make sense if it did, right?


Thanks
AFAIK XML does not guarantee order.

And there are ready-made libraries for XML parsing (libxml2), so probably they should be used.
 
Old 02-24-2010, 01:40 PM   #3
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 115Reputation: 115
XML itself doesn't care anything about the data except that it's formatted correctly. This sort of concern would be defined by the schema of your particular flavor of XML.
 
Old 02-24-2010, 03:34 PM   #4
vxc69
Member
 
Registered: Jul 2004
Distribution: Ubuntu
Posts: 387

Original Poster
Rep: Reputation: 33
Quote:
Originally Posted by tuxdev View Post
XML itself doesn't care anything about the data except that it's formatted correctly. This sort of concern would be defined by the schema of your particular flavor of XML.
Well this is the problem. It's a huge xml file. To speed it up, once I find a particular parent node, to take just the information I want, I skip the parser a number of times so I get to the child node(s) I want in a particular parent node. This speeds it up immensely, however, the code isn't very nice, specially if in the future, the order is changed.

If I have a series of if statements to check every child node for what I want, it slows down.

This is a streaming pull parser.

Performance or Reliability?

Last edited by vxc69; 02-24-2010 at 03:37 PM.
 
Old 02-24-2010, 04:04 PM   #5
mattca
Member
 
Registered: Jan 2009
Distribution: Slackware 14.1
Posts: 333

Rep: Reputation: 56
I think relying on the order would be a Bad Idea. Unless something about the inherent nature of the data implies an order (ie, a list of dates).

Quote:
Originally Posted by vxc69 View Post
To speed it up, once I find a particular parent node, to take just the information I want, I skip the parser a number of times so I get to the child node(s) I want in a particular parent node.
Hmmm.. not sure I understand exactly what you're dealing with here. But it sounds like you have a parent node that has multiple child nodes of the same type? And which child node you need changes?

Any chance of getting a snippet of your XML that demonstrates this?

Also, what language are you parsing this in?

Quote:
Originally Posted by vxc69 View Post
Performance or Reliability?
I say reliability. Performance isn't worth much if it doesn't work.
 
Old 02-24-2010, 05:21 PM   #6
vxc69
Member
 
Registered: Jul 2004
Distribution: Ubuntu
Posts: 387

Original Poster
Rep: Reputation: 33
Quote:
Originally Posted by mattca View Post
I say reliability. Performance isn't worth much if it doesn't work.
Well, not if the given XML is assured to have that order.

The XML is of this nature, the file is a couple of gigs. Parsing in Java using STAX:

Code:
<dingBatData>
  <dingBatEvent>
   <id>34</id>
   <name>LL(K)*</name>
   <apc>B1C9</apc>
   <killPos>
     <x>29.2</x>
     <y>32.1</y>
   </killPos>
 </dingBatEvent>
 <dingBatEvent>
    .
    .
    .
  <killPos>
    .
    .
  </killPos>
 </dingBatEvent>
    .
    .
    .
    .
<dingBatData>

Last edited by vxc69; 02-24-2010 at 05:27 PM.
 
Old 02-24-2010, 05:41 PM   #7
mattca
Member
 
Registered: Jan 2009
Distribution: Slackware 14.1
Posts: 333

Rep: Reputation: 56
Quote:
Originally Posted by vxc69 View Post
Well, not if the given XML is assured to have that order.
Well then order has no impact on reliability, and your performance vs reliability question is meaningless in this context.

Quote:
Code:
<dingBatData>
  <dingBatEvent>
   <id>34</id>
   <name>LL(K)*</name>
   <apc>B1C9</apc>
   <killPos>
     <x>29.2</x>
     <y>32.1</y>
   </killPos>
 </dingBatEvent>
 <dingBatEvent>
    .
    .
    .
  <killPos>
    .
    .
  </killPos>
 </dingBatEvent>
    .
    .
    .
    .
<dingBatData>
I assume the nodes you're trying to avoid iterating through are "dingBatEvents"?

Unfortunately I don't know much about parsing XML in java.. I've done a bit in PHP though and was hoping you were using that.
 
Old 02-24-2010, 06:38 PM   #8
nadroj
Senior Member
 
Registered: Jan 2005
Location: Canada
Distribution: ubuntu
Posts: 2,539

Rep: Reputation: 60
I haven't fully followed this thread but just wanted to make a few comments.

If you want to enforce the structure, content, ordering, etc, of an XML document, the only way to do it is to use, as tuxdev said, schemas. Especially if the ordering of elements, etc, are very important, that should be more evidence that reliability has higher priority than performance. The cornerstone of good software is quality. You could have a program that sometimes doesn't work, but doesn't work very fast. Alternatively, you could have a program that always works, but the amount of time it takes may be unpredictable or untimely.

This could be compared to TCP vs UDP, in networking, where TCP is reliable with more overhead, and UDP is less reliable with less overhead. Each has their own application. You probably wouldn't prefer to use TCP to listen to streaming music. Also, you probably wouldn't prefer to use UDP in some critical web service API.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] java xml parsing Wim Sturkenboom Programming 8 10-02-2009 03:35 AM
Need help in parsing XML file madhi Programming 12 07-10-2009 01:36 AM
Parsing XML file sneha hendre Linux - Newbie 2 09-15-2008 10:55 PM
awk question - parsing xml file epoo Programming 7 01-24-2007 02:13 PM
XML parsing in C irfanhab Programming 3 05-06-2006 12:47 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration