ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have to parse a file containing billions of records and populate them in the Data structure. I have used a lot of C++ class and creating objects of the class I am storing the information retrieved by parsing the file.
Now as the file become huge and number of objects become very large my code is getting bad_alloc error as it is not finding any space avalable in the heap for allocating new object.
Not sure if this helps you, but my thoughts go to splitting the big file up to smaller ones.
That's my approach during work as well. I tend to get 200 MB or bigger files that need to be transfered. I split them up, zip them and then at destination unzip them and put them back together.
The linux command: split. Find out about it yourself
Edit: For Windows there are programs that can do just that like JSplit and so on.
Last edited by Mark1986; 11-17-2010 at 06:16 AM.
Reason: Edit: noticed you use Window$
If you post what your original problem is then someone may be able to suggest a better solution using existing tools which have already been tested and verified with huge data volumes.
trivial beginner error
create data struct that read one record at time and then read/store in db/deallocate/read again cycle
in generic pseudocode
struct datacell
name
details
desc open(file,read)
for ( desc!=END_FILE)
{ read datacell from desc, store datacell in db,deallocate datacell)
close(file);
you can add free memory detection and read buffer corresponding size if want more performance. For datacell storage i recommend SQL database.
Last edited by sunnydrake; 11-17-2010 at 08:26 AM.
...
Now as the file become huge and number of objects become very large my code is getting bad_alloc error as it is not finding any space avalable in the heap for allocating new object.
Is there any way to parse the file?
Is there a need to store all the data you've extracted from the input file in memory ? I.e. do you deal with an all <-> all records relationship ?
Is there any way to stop the parser at a certain point and then return to the same place after I finish the job
Huh ? Isn't it you who has written the parser ? If so, what prevents you from storing the parser state and restarting the parser with the stored state ?
if you store data in file as nonbreak sequental order you can later use fseek C++ to move read pointer.
otherwise count read records or write last read position. (however u can try to implement more complex php/C# like serialize way (store/restore read structure/class state))
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.