perl script - need help to improve performance
i wrote the following small script to find different exception traces and find the count. could you please help me with suggestions to improve performance of this - this has to be run on AIX machine, on files of 100MB size.
Code:
while(<FH>){ |
Remove
Code:
++$c; Replace Code:
$tempCount=$hashmap{$1}; Code:
$hashmap{$1} = ++$hashmap{$1}; Code:
$value = $hashmap{$key}; Code:
print "\n value:: $hashmap{$key}"; |
Depending on how many millions of unique values you might be dealing with in any particular run, a completely different approach might also be called for...
Your present algorithm is based on the assumption that "hashes are 'free.'" Unfortunately, when a hash grows into hundreds-of-thousands or millions of entries, it is no longer free. Instead, every single access runs the risk of a page fault. The application, and the system itself, slows to a crawl... An "unexpectedly different" algorithm would write all those keys to a disk-file, then sort that file (on disk...), and count them. When a file is sorted, all of the occurences of any particular key-value are guaranteed to be adjacent. "Counting" them, therefore, requires no main-memory at all. Yes... this is "how they did it with punched cards, even before digital computers existed." And... :eek: ... it still works. (In fact, it can out-perform algorithms such as yours by a factor of thousands . . . ) |
To be honest, I'm wondering if you've actually run that or whether its part of a much larger program. Normally I'd expect Perl to rip through a 100MB file pdq...
I can't imagine AIX on a small machine... |
Code:
while ( /$exStart(.*?)$exEnd/sgmo ) { Code:
if /$exStart(.*?)$exEnd/sgmo { |
Quote:
|
Quote:
|
Well, you could fork a number of copies, as each file seems to be treated separately.
|
All times are GMT -5. The time now is 04:01 PM. |