LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 04-07-2009, 10:51 AM   #31
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453

Quote:
Originally Posted by sundialsvcs View Post
Stick to the subject, please... "Cheap beer and forums do not mix."

No, it probably won't be "better than awk."

"awk" is a very well-written program that is specialized for doing what you are doing.

All of the delays associated with this task will be mechanical ones: disk I/O times and network time. But "awk" knows to tell the operating-system that the file is being read sequentially, and therefore the operating system will know how to line-up lots of file buffers and other tricks to streamline the operation as much as the hardware will allow.

If the time required to do this task is problematic to the business, then there are various things that you can do:
  1. Invest in fast storage-hardware... SATA, FireWire.
  2. Instead of using the disk controllers built into the motherboard, buy a controller card. An inexpensive unit can make a dramatic difference.
  3. Put the input file and the output file on different disk volumes.
  4. Do not follow the siren that says, "put it all in memory..." Abandon all hope, ye who enter there!
Face it: when you're dealing with 10 gigabytes of data, "some things take time." If you're doing the task in "awk," and doing it well, then you are using a robust tool that was specifically designed for the task. You have not erred in the approach that you are using right now. "Diddling with it" will not improve it.
By the way, if I understood the OP correctly, the lines are independent, i.e. line by line parsing should be OK.

If it's the case, then the very first legitimate question is: "Why is it single 10GB file and not a number of much smaller files ?".

The point is that a number of files may be stored on separate hard drives and better yet the drives can be connected to different CPUs, so the whole processing can be done in parallel and the the results can be merged.
 
Old 04-07-2009, 10:52 AM   #32
int0x80
Member
 
Registered: Sep 2002
Location: Cincinnati
Distribution: Debian GNU/Linux
Posts: 310

Rep: Reputation: 31
Quote:
Originally Posted by jglands View Post
Well he would at least have support? What does he have from linux now? Some pimple faced kids telling him he is wrong instead of helping him.
I don't see any .NET devs on here showing him the way???

FAIL
 
Old 04-07-2009, 10:53 AM   #33
jglands
LQ Newbie
 
Registered: Apr 2009
Posts: 17

Rep: Reputation: 1
That's right. He asked here instead of someplace that will help him.
 
Old 04-07-2009, 10:55 AM   #34
int0x80
Member
 
Registered: Sep 2002
Location: Cincinnati
Distribution: Debian GNU/Linux
Posts: 310

Rep: Reputation: 31
Quote:
Originally Posted by jglands View Post
That's right. He asked here instead of someplace that will help him.
Just because some of the members on here (Telemachos, Sergei) can't read or aren't smart enough to solve problems before posting doesn't mean that the entire community is worthless. LQ is representative of the internet with people of varying levels of intelligence. Some are stars (sundialsvcs), and others have no light on upstairs (jglands).
 
Old 04-07-2009, 10:59 AM   #35
jglands
LQ Newbie
 
Registered: Apr 2009
Posts: 17

Rep: Reputation: 1
Quote:
Originally Posted by int0x80 View Post
Some are stars (sundialsvcs), and others have no light on upstairs (jglands).
Just because I have no hair doesn't mean my lights are not on. I could look like this guy.
Attached Images
File Type: jpg 225px-JonMaddogHallFlourish.jpg (13.0 KB, 2 views)
 
Old 04-07-2009, 11:01 AM   #36
int0x80
Member
 
Registered: Sep 2002
Location: Cincinnati
Distribution: Debian GNU/Linux
Posts: 310

Rep: Reputation: 31
Quote:
Originally Posted by jglands View Post
Just because I have no hair doesn't mean my lights are not on. I could look like this guy.
Unfortunately you look like this...
Attached Images
File Type: jpg jglands.jpg (74.9 KB, 1 views)
 
Old 04-07-2009, 11:01 AM   #37
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
Quote:
Originally Posted by int0x80 View Post
Just because some of the members on here (Telemachos, Sergei) can't read or aren't smart enough to solve problems before posting doesn't mean that the entire community is worthless.
Charming. Sergei and I said essentially the same thing as Sundialscvs, though I admit he said it more fully. What we all said was that the OP's C code was unlikely to beat a pre-existing tool (awk, Perl, Python, whatever) because the big issue was the simple math of the filesize.
 
Old 04-07-2009, 11:02 AM   #38
jglands
LQ Newbie
 
Registered: Apr 2009
Posts: 17

Rep: Reputation: 1
Ok Code Monkey
Attached Images
File Type: jpg code_monkey_colour.jpg (213.6 KB, 3 views)
 
Old 04-07-2009, 11:03 AM   #39
vache
LQ Newbie
 
Registered: Apr 2009
Posts: 5

Original Poster
Rep: Reputation: 0
So, the short answer to my question is "no". Thanks
 
Old 04-07-2009, 11:06 AM   #40
int0x80
Member
 
Registered: Sep 2002
Location: Cincinnati
Distribution: Debian GNU/Linux
Posts: 310

Rep: Reputation: 31
Quote:
Originally Posted by jglands View Post
Ok Code Monkey
Clearly that is an image of your kin. Notice the `language name="C#"` in your image.

FAIL

Maybe one day the MS crowd will evolve to Linux.
 
Old 04-07-2009, 11:06 AM   #41
jglands
LQ Newbie
 
Registered: Apr 2009
Posts: 17

Rep: Reputation: 1
See the guy has given up on Linux. About time.
 
Old 04-07-2009, 11:08 AM   #42
int0x80
Member
 
Registered: Sep 2002
Location: Cincinnati
Distribution: Debian GNU/Linux
Posts: 310

Rep: Reputation: 31
The solution was not "use VB". You lose.
 
Old 04-07-2009, 11:08 AM   #43
jglands
LQ Newbie
 
Registered: Apr 2009
Posts: 17

Rep: Reputation: 1
You just wish your stuff could be as good as C#. Good luck with finding your answer Vache. You won't get an answer from these preteens.
 
Old 04-07-2009, 11:09 AM   #44
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by Telemachos View Post
Charming. Sergei and I said essentially the same thing as Sundialscvs, though I admit he said it more fully. What we all said was that the OP's C code was unlikely to beat a pre-existing tool (awk, Perl, Python, whatever) because the big issue was the simple math of the filesize.
I once incidentally looked into Perl regular expressions code, which is a derived work of some standard RE library.

The most frequent comment was "we are doing/have changed this and that for efficiency reasons".
 
Old 04-07-2009, 11:10 AM   #45
int0x80
Member
 
Registered: Sep 2002
Location: Cincinnati
Distribution: Debian GNU/Linux
Posts: 310

Rep: Reputation: 31
You wish C# could be as good as Java. Good luck with your Xtra Proprietary OS.
 
  


Reply

Tags
ascii, awk, fgets, fopen, parse


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how can I differentiate two large text files using shell script? Files are like below surya_gadde Linux - Software 1 01-20-2009 03:52 AM
parse input text file and generate output TsanChung Programming 5 07-27-2008 11:23 PM
How to parse text file to a set text column width and output to new text file? jsstevenson Programming 12 04-23-2008 03:36 PM
sed script to read only columns 4 to 6 in output database cranium2004 Programming 10 02-28-2006 08:20 AM
How to parse log files into text view using GLADE shandy^^^ Programming 8 02-07-2006 09:13 PM


All times are GMT -5. The time now is 06:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration