LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   need help on processing large data files (http://www.linuxquestions.org/questions/programming-9/need-help-on-processing-large-data-files-156132/)

eph 03-11-2004 12:50 AM

need help on processing large data files
 
how do i process large text files? (in the gigabyte range)

i need to do it efficiently. should i use parallel processing? can anyone help on this. thanks :)

Qzukk 03-11-2004 01:04 AM

Depends on what you are doing. If you can say with certainty that no piece of data in the file depends on any other data in the file, then you can split the file up into chunks and process them distributed in parallel. If you end up doing this as a single giant file, you'll want to look into using mmap() to map the file into memory for access to it, rather than using the regular read/write commands.

eph 03-11-2004 01:24 AM

i'm going to process the log files of TCPdump. i'm thinking of using PERL and Beowulf for clustering. would it make any difference? and would PERL work on this?

i'm just a newbie to this.

bigearsbilly 03-11-2004 05:56 AM

I'd try perl first. if it's too slow, then think about C.

I've made a hash array in perl once with millions of code=>price
pairs. it loaded quite slow but worked very fast.

(but i ended up using DBM and C!)

billy


All times are GMT -5. The time now is 06:45 AM.