-   -   Read large text files (~10GB), parse for columns, output (

Telemachos 04-07-2009 10:41 AM

@ int0x80: jglands has posted only to this thread and only to troll. Please stop feeding him.

ghostdog74 04-07-2009 10:42 AM

where's the moderator?

sundialsvcs 04-07-2009 10:44 AM

:rolleyes: Stick to the subject, please... "Cheap beer and forums do not mix."

No, it probably won't be "better than awk."

"awk" is a very well-written program that is specialized for doing what you are doing.

All of the delays associated with this task will be mechanical ones: disk I/O times and network time. But "awk" knows to tell the operating-system that the file is being read sequentially, and therefore the operating system will know how to line-up lots of file buffers and other tricks to streamline the operation as much as the hardware will allow.

If the time required to do this task is problematic to the business, then there are various things that you can do:
  1. Invest in fast storage-hardware... SATA, FireWire.
  2. Instead of using the disk controllers built into the motherboard, buy a controller card. An inexpensive unit can make a dramatic difference.
  3. Put the input file and the output file on different disk volumes.
  4. Do not follow the siren that says, "put it all in memory..." Abandon all hope, ye who enter there!
Face it: when you're dealing with 10 gigabytes of data, "some things take time." If you're doing the task in "awk," and doing it well, then you are using a robust tool that was specifically designed for the task. You have not erred in the approach that you are using right now. "Diddling with it" will not improve it.

Telemachos 04-07-2009 10:45 AM

For the record, it would be unfortunate to lock the whole thread. The question (How do I deal with a mega-sized file and the associated I/O problems?) is a serious one and deserves some discussion.

