Help me

avishekjoy88 · 11-17-2010, 05:31 AM

Hi,

I have to parse a file containing billions of records and populate them in the Data structure. I have used a lot of C++ class and creating objects of the class I am storing the information retrieved by parsing the file.

Now as the file become huge and number of objects become very large my code is getting bad_alloc error as it is not finding any space avalable in the heap for allocating new object.

Is there any way to parse the file?

Mark1986 · 11-17-2010, 06:15 AM

Not sure if this helps you, but my thoughts go to splitting the big file up to smaller ones.

That's my approach during work as well. I tend to get 200 MB or bigger files that need to be transfered. I split them up, zip them and then at destination unzip them and put them back together.

The linux command: split. Find out about it yourself

Edit: For Windows there are programs that can do just that like JSplit and so on.

devnull10 · 11-17-2010, 06:25 AM

If you post what your original problem is then someone may be able to suggest a better solution using existing tools which have already been tested and verified with huge data volumes.

sunnydrake · 11-17-2010, 08:25 AM

trivial beginner error

create data struct that read one record at time and then read/store in db/deallocate/read again cycle
in generic pseudocode
struct datacell
name
details

desc open(file,read)

for ( desc!=END_FILE)
{ read datacell from desc, store datacell in db,deallocate datacell)

close(file);

you can add free memory detection and read buffer corresponding size if want more performance. For datacell storage i recommend SQL database.

Sergei Steshenko · 11-17-2010, 05:14 PM

Quote:

Originally Posted by avishekjoy88

...
Now as the file become huge and number of objects become very large my code is getting bad_alloc error as it is not finding any space avalable in the heap for allocating new object.

Is there any way to parse the file?

Is there a need to store all the data you've extracted from the input file in memory ? I.e. do you deal with an all <-> all records relationship ?

avishekjoy88 · 11-21-2010, 09:28 PM

Is there any way to stop the parser at a certain point and then return to the same place after I finish the job

Sergei Steshenko · 11-21-2010, 11:28 PM

Quote:

Originally Posted by avishekjoy88

Is there any way to stop the parser at a certain point and then return to the same place after I finish the job

Huh ? Isn't it you who has written the parser ? If so, what prevents you from storing the parser state and restarting the parser with the stored state ?

sunnydrake · 11-22-2010, 02:43 AM

if you store data in file as nonbreak sequental order you can later use fseek C++ to move read pointer.
otherwise count read records or write last read position. (however u can try to implement more complex php/C# like serialize way (store/restore read structure/class state))