Review your favorite Linux distribution.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 03-10-2004, 11:50 PM   #1
LQ Newbie
Registered: Jan 2004
Location: Muntinlupa, Philippines
Distribution: Mandrake
Posts: 8

Rep: Reputation: 0
Exclamation need help on processing large data files

how do i process large text files? (in the gigabyte range)

i need to do it efficiently. should i use parallel processing? can anyone help on this. thanks
Old 03-11-2004, 12:04 AM   #2
Registered: Jun 2003
Posts: 132

Rep: Reputation: 15
Depends on what you are doing. If you can say with certainty that no piece of data in the file depends on any other data in the file, then you can split the file up into chunks and process them distributed in parallel. If you end up doing this as a single giant file, you'll want to look into using mmap() to map the file into memory for access to it, rather than using the regular read/write commands.
Old 03-11-2004, 12:24 AM   #3
LQ Newbie
Registered: Jan 2004
Location: Muntinlupa, Philippines
Distribution: Mandrake
Posts: 8

Original Poster
Rep: Reputation: 0
i'm going to process the log files of TCPdump. i'm thinking of using PERL and Beowulf for clustering. would it make any difference? and would PERL work on this?

i'm just a newbie to this.
Old 03-11-2004, 04:56 AM   #4
Senior Member
Registered: Mar 2004
Location: england
Distribution: NetBSD, Void, Debian, Mint, Ubuntu, Puppy, Raspbian
Posts: 3,487

Rep: Reputation: 233Reputation: 233Reputation: 233
I'd try perl first. if it's too slow, then think about C.

I've made a hash array in perl once with millions of code=>price
pairs. it loaded quite slow but worked very fast.

(but i ended up using DBM and C!)



Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
HP laserjet 6p stalls with large amount of data simjii Mandriva 2 04-10-2020 08:52 PM
Data Processing joelhop Linux - General 8 01-01-2006 08:08 PM
Data Processing Server peter72 Linux - Software 1 06-14-2005 11:17 AM
large data trasfer problem mos definitely General 2 12-27-2004 05:23 PM
Large data files on CD dema Linux - Newbie 1 01-26-2002 10:30 PM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:51 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration