LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-13-2008, 01:56 PM   #1
johngreg
LQ Newbie
 
Registered: Oct 2008
Posts: 14

Rep: Reputation: 0
Lightbulb perl script - need help to improve performance


i wrote the following small script to find different exception traces and find the count. could you please help me with suggestions to improve performance of this - this has to be run on AIX machine, on files of 100MB size.
Code:
while(<FH>){
    while ( /$exStart(.*?)$exEnd/sgmo ) {
        ++$c;
        if(exists $hashmap{$1}){
            $tempCount=$hashmap{$1};
            $hashmap{$1}= ++$tempCount;
        }
        else{
            $hashmap{$1}=1;
        }
    }
}

foreach $key ( keys %hashmap ) {
        $value = $hashmap{$key};
        print "\n value:: ", $value;
}
$num_keys = keys %hashmap;
thanks

Last edited by johngreg; 10-13-2008 at 01:57 PM.
 
Old 10-13-2008, 08:05 PM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
Remove
Code:
        ++$c;
you don't use that value anywhere.

Replace
Code:
            $tempCount=$hashmap{$1};
            $hashmap{$1}= ++$tempCount;
with
Code:
$hashmap{$1} = ++$hashmap{$1};
replace
Code:
        $value = $hashmap{$key};
        print "\n value:: ", $value;
with
Code:
print "\n value:: $hashmap{$key}";
 
Old 10-13-2008, 11:16 PM   #3
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,401

Rep: Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119
Depending on how many millions of unique values you might be dealing with in any particular run, a completely different approach might also be called for...

Your present algorithm is based on the assumption that "hashes are 'free.'" Unfortunately, when a hash grows into hundreds-of-thousands or millions of entries, it is no longer free. Instead, every single access runs the risk of a page fault. The application, and the system itself, slows to a crawl...

An "unexpectedly different" algorithm would write all those keys to a disk-file, then sort that file (on disk...), and count them. When a file is sorted, all of the occurences of any particular key-value are guaranteed to be adjacent. "Counting" them, therefore, requires no main-memory at all.

Yes... this is "how they did it with punched cards, even before digital computers existed." And... ... it still works.

(In fact, it can out-perform algorithms such as yours by a factor of thousands . . . )
 
Old 10-14-2008, 01:56 AM   #4
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
To be honest, I'm wondering if you've actually run that or whether its part of a much larger program. Normally I'd expect Perl to rip through a 100MB file pdq...
I can't imagine AIX on a small machine...
 
Old 10-14-2008, 08:34 AM   #5
keefaz
Senior Member
 
Registered: Mar 2004
Distribution: Slackware
Posts: 4,614

Rep: Reputation: 136Reputation: 136
Code:
while ( /$exStart(.*?)$exEnd/sgmo ) {
Are you sure you don't want:
Code:
if /$exStart(.*?)$exEnd/sgmo {
Also, don't forget to close FH
 
Old 10-14-2008, 01:39 PM   #6
johngreg
LQ Newbie
 
Registered: Oct 2008
Posts: 14

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by chrism01 View Post
To be honest, I'm wondering if you've actually run that or whether its part of a much larger program. Normally I'd expect Perl to rip through a 100MB file pdq...
I can't imagine AIX on a small machine...
yes Chris, you are right.. this script runs pretty fast on a single 100MB file, but this needs to be run on 400 such files every hour... so i dint want to face surprises in live servers..
 
Old 10-14-2008, 01:46 PM   #7
johngreg
LQ Newbie
 
Registered: Oct 2008
Posts: 14

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by sundialsvcs View Post
Depending on how many millions of unique values you might be dealing with in any particular run, a completely different approach might also be called for...

Your present algorithm is based on the assumption that "hashes are 'free.'" Unfortunately, when a hash grows into hundreds-of-thousands or millions of entries, it is no longer free. Instead, every single access runs the risk of a page fault. The application, and the system itself, slows to a crawl...

An "unexpectedly different" algorithm would write all those keys to a disk-file, then sort that file (on disk...), and count them. When a file is sorted, all of the occurences of any particular key-value are guaranteed to be adjacent. "Counting" them, therefore, requires no main-memory at all.

Yes... this is "how they did it with punched cards, even before digital computers existed." And... ... it still works.

(In fact, it can out-perform algorithms such as yours by a factor of thousands . . . )
let me try: so should i write to a new file with some delimiter, open the temp file using some split and then again wont i need to go to some ds to store the two dimensional values? the exception trace(around 100 lines) and the number of repetitions of that trace?
 
Old 10-14-2008, 07:00 PM   #8
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
Well, you could fork a number of copies, as each file seems to be treated separately.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Have i tried everything to improve my disk performance? drben Linux - Hardware 15 02-07-2006 02:38 PM
Is there a way to improve java performance? jeffreybluml Linux - Newbie 1 07-10-2004 04:08 AM
How to Improve performance of PC Imran Aziz Linux - Software 3 06-03-2004 02:10 PM
improve scsi hd performance? bdp Linux - Hardware 2 01-12-2004 11:37 PM
ways to improve performance flipboi Linux - Newbie 6 10-25-2003 11:22 AM


All times are GMT -5. The time now is 07:59 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration