LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-17-2010, 05:31 AM   #1
avishekjoy88
LQ Newbie
 
Registered: Nov 2010
Posts: 2

Rep: Reputation: 0
Help me


Hi,

I have to parse a file containing billions of records and populate them in the Data structure. I have used a lot of C++ class and creating objects of the class I am storing the information retrieved by parsing the file.

Now as the file become huge and number of objects become very large my code is getting bad_alloc error as it is not finding any space avalable in the heap for allocating new object.

Is there any way to parse the file?
 
Old 11-17-2010, 06:15 AM   #2
Mark1986
Member
 
Registered: Aug 2008
Location: Netherlands
Distribution: Xubuntu
Posts: 87

Rep: Reputation: 11
Not sure if this helps you, but my thoughts go to splitting the big file up to smaller ones.

That's my approach during work as well. I tend to get 200 MB or bigger files that need to be transfered. I split them up, zip them and then at destination unzip them and put them back together.

The linux command: split. Find out about it yourself

Edit: For Windows there are programs that can do just that like JSplit and so on.

Last edited by Mark1986; 11-17-2010 at 06:16 AM. Reason: Edit: noticed you use Window$
 
Old 11-17-2010, 06:25 AM   #3
devnull10
Member
 
Registered: Jan 2010
Location: Lancashire
Distribution: Slackware Stable
Posts: 572

Rep: Reputation: 120Reputation: 120
If you post what your original problem is then someone may be able to suggest a better solution using existing tools which have already been tested and verified with huge data volumes.
 
Old 11-17-2010, 08:25 AM   #4
sunnydrake
Member
 
Registered: Jul 2009
Location: Kiev,Ukraine
Distribution: Ubuntu,Slax,RedHat
Posts: 289
Blog Entries: 1

Rep: Reputation: 61
trivial beginner error
create data struct that read one record at time and then read/store in db/deallocate/read again cycle
in generic pseudocode
struct datacell
name
details

desc open(file,read)

for ( desc!=END_FILE)
{ read datacell from desc, store datacell in db,deallocate datacell)

close(file);

you can add free memory detection and read buffer corresponding size if want more performance. For datacell storage i recommend SQL database.

Last edited by sunnydrake; 11-17-2010 at 08:26 AM.
 
Old 11-17-2010, 05:14 PM   #5
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by avishekjoy88 View Post
...
Now as the file become huge and number of objects become very large my code is getting bad_alloc error as it is not finding any space avalable in the heap for allocating new object.

Is there any way to parse the file?
Is there a need to store all the data you've extracted from the input file in memory ? I.e. do you deal with an all <-> all records relationship ?
 
Old 11-21-2010, 09:28 PM   #6
avishekjoy88
LQ Newbie
 
Registered: Nov 2010
Posts: 2

Original Poster
Rep: Reputation: 0
Is there any way to stop the parser at a certain point and then return to the same place after I finish the job
 
Old 11-21-2010, 11:28 PM   #7
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by avishekjoy88 View Post
Is there any way to stop the parser at a certain point and then return to the same place after I finish the job
Huh ? Isn't it you who has written the parser ? If so, what prevents you from storing the parser state and restarting the parser with the stored state ?
 
Old 11-22-2010, 02:43 AM   #8
sunnydrake
Member
 
Registered: Jul 2009
Location: Kiev,Ukraine
Distribution: Ubuntu,Slax,RedHat
Posts: 289
Blog Entries: 1

Rep: Reputation: 61
if you store data in file as nonbreak sequental order you can later use fseek C++ to move read pointer.
otherwise count read records or write last read position. (however u can try to implement more complex php/C# like serialize way (store/restore read structure/class state))
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration