LinuxQuestions.org - perl script - need help to improve performance

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - perl script - need help to improve performance (https://www.linuxquestions.org/questions/programming-9/perl-script-need-help-to-improve-performance-676091/)

perl script - need help to improve performance

i wrote the following small script to find different exception traces and find the count. could you please help me with suggestions to improve performance of this - this has to be run on AIX machine, on files of 100MB size.

Code:

while(<FH>){

    while ( /$exStart(.*?)$exEnd/sgmo ) {

        ++$c;

        if(exists $hashmap{$1}){

            $tempCount=$hashmap{$1};

            $hashmap{$1}= ++$tempCount;

        }

        else{

            $hashmap{$1}=1;

        }

    }

}



foreach $key ( keys %hashmap ) {

        $value = $hashmap{$key};

        print "\n value:: ", $value;

}

$num_keys = keys %hashmap;

thanks

Remove

Code:

++$c;

you don't use that value anywhere.

Replace

Code:

            $tempCount=$hashmap{$1};

            $hashmap{$1}= ++$tempCount;

with

Code:

$hashmap{$1} = ++$hashmap{$1};

replace

Code:

        $value = $hashmap{$key};

        print "\n value:: ", $value;

with

Code:

print "\n value:: $hashmap{$key}";

Depending on how many millions of unique values you might be dealing with in any particular run, a completely different approach might also be called for...

Your present algorithm is based on the assumption that "hashes are 'free.'" Unfortunately, when a hash grows into hundreds-of-thousands or millions of entries, it is no longer free. Instead, every single access runs the risk of a page fault. The application, and the system itself, slows to a crawl...

An "unexpectedly different" algorithm would write all those keys to a disk-file, then sort that file (on disk...), and count them. When a file is sorted, all of the occurences of any particular key-value are guaranteed to be adjacent. "Counting" them, therefore, requires no main-memory at all.

Yes... this is "how they did it with punched cards, even before digital computers existed." And... :eek: ... it still works.

(In fact, it can out-perform algorithms such as yours by a factor of thousands . . . )

To be honest, I'm wondering if you've actually run that or whether its part of a much larger program. Normally I'd expect Perl to rip through a 100MB file pdq...
I can't imagine AIX on a small machine...

Code:

while ( /$exStart(.*?)$exEnd/sgmo ) {

Are you sure you don't want:

Code:

if /$exStart(.*?)$exEnd/sgmo {

Also, don't forget to close FH :)

Quote:

Originally Posted by chrism01 (Post 3309414)

yes Chris, you are right.. this script runs pretty fast on a single 100MB file, but this needs to be run on 400 such files every hour... so i dint want to face surprises in live servers..

Quote:

Originally Posted by sundialsvcs (Post 3309321)

let me try: so should i write to a new file with some delimiter, open the temp file using some split and then again wont i need to go to some ds to store the two dimensional values? the exception trace(around 100 lines) and the number of repetitions of that trace?

Well, you could fork a number of copies, as each file seems to be treated separately.