Quote:
Originally Posted by Neruocomp
As a curious side project I'm playing with mzXML data(an xml format for holding mass spec data). A typical scan can be quite large, even up into GB size. I'm wondering how would one go about parsing an xml file in sections, one section at a time. The idea being if the computer doesn't have enough memory to load up the entire data file, work on chunks of it at a time.
Anything similar in other programming languages?
|
I don't know what you ultimately need to do, but first thing which comes to mind is creating kind of index file in addition to your input XML one.
The index file will contain start and end positions of XML constructs you might ultimately want to extract from your source XML file.
Maybe this:
http://expat.sourceforge.net/ is a good starting point, as well as this:
http://xmlsoft.org/ ->
http://xmlsoft.org/downloads.html .
There is a bunch o Perl XML parsers:
http://search.cpan.org/search?query=XML+parser&mode=all , so probably there is a bunch for Python too.