LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-05-2010, 05:17 PM   #1
Neruocomp
Member
 
Registered: Oct 2004
Distribution: Slackware, CentOS
Posts: 135

Rep: Reputation: 15
Python: Parsing data in chunks/sections?


As a curious side project I'm playing with mzXML data(an xml format for holding mass spec data). A typical scan can be quite large, even up into GB size. I'm wondering how would one go about parsing an xml file in sections, one section at a time. The idea being if the computer doesn't have enough memory to load up the entire data file, work on chunks of it at a time.

Anything similar in other programming languages?
 
Old 11-06-2010, 12:46 AM   #2
adixon
Member
 
Registered: Oct 2010
Posts: 34

Rep: Reputation: 3
hey Neruocomp,
sounds like an interesting project; in c (and really most languages) you would read the file in chunks using functions to goto particular point (cursor location) in file then functions to read from cursor location.
For c you could use fseek, and fread
(what language are you currently talking about on last line?)
regards, alex
 
Old 11-06-2010, 12:52 AM   #3
adixon
Member
 
Registered: Oct 2010
Posts: 34

Rep: Reputation: 3
Apologies, just realised you're talking python;
the functions you want are seek(int ) and read(int )
these are methods of the file object, so would call myfile.seek, after you've opened it properly
check http://docs.python.org/release/2.5.2/tut/node9.html
 
Old 11-06-2010, 05:06 AM   #4
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by Neruocomp View Post
As a curious side project I'm playing with mzXML data(an xml format for holding mass spec data). A typical scan can be quite large, even up into GB size. I'm wondering how would one go about parsing an xml file in sections, one section at a time. The idea being if the computer doesn't have enough memory to load up the entire data file, work on chunks of it at a time.

Anything similar in other programming languages?
I don't know what you ultimately need to do, but first thing which comes to mind is creating kind of index file in addition to your input XML one.

The index file will contain start and end positions of XML constructs you might ultimately want to extract from your source XML file.

Maybe this: http://expat.sourceforge.net/ is a good starting point, as well as this: http://xmlsoft.org/ -> http://xmlsoft.org/downloads.html .

There is a bunch o Perl XML parsers: http://search.cpan.org/search?query=XML+parser&mode=all , so probably there is a bunch for Python too.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python - How to get text file data into an array on python. golmschenk Programming 4 11-11-2013 10:15 AM
[C] help parsing lines into chunks and writing each chunk gnashley Programming 2 09-11-2010 09:49 AM
Extracting chunks of data based on variables stored in another file (Perl?) mchriste Linux - Software 2 03-12-2009 01:44 PM
memcpy_toio transfers data in 4 byte chunks, but I need to transfer data in one lump. jbreaka4lyfe Linux - Embedded & Single-board computer 2 06-02-2008 12:25 PM
Parsing a Grub config file: Python vharishankar Programming 8 03-03-2006 11:24 AM


All times are GMT -5. The time now is 03:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration