ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
i wrote the following small script to find different exception traces and find the count. could you please help me with suggestions to improve performance of this - this has to be run on AIX machine, on files of 100MB size.
Depending on how many millions of unique values you might be dealing with in any particular run, a completely different approach might also be called for...
Your present algorithm is based on the assumption that "hashes are 'free.'" Unfortunately, when a hash grows into hundreds-of-thousands or millions of entries, it is no longer free. Instead, every single access runs the risk of a page fault. The application, and the system itself, slows to a crawl...
An "unexpectedly different" algorithm would write all those keys to a disk-file, then sort that file (on disk...), and count them. When a file is sorted, all of the occurences of any particular key-value are guaranteed to be adjacent. "Counting" them, therefore, requires no main-memory at all.
Yes... this is "how they did it with punched cards, even before digital computers existed." And... ... it still works.
(In fact, it can out-perform algorithms such as yours by a factor of thousands . . . )
To be honest, I'm wondering if you've actually run that or whether its part of a much larger program. Normally I'd expect Perl to rip through a 100MB file pdq...
I can't imagine AIX on a small machine...
To be honest, I'm wondering if you've actually run that or whether its part of a much larger program. Normally I'd expect Perl to rip through a 100MB file pdq...
I can't imagine AIX on a small machine...
yes Chris, you are right.. this script runs pretty fast on a single 100MB file, but this needs to be run on 400 such files every hour... so i dint want to face surprises in live servers..
Depending on how many millions of unique values you might be dealing with in any particular run, a completely different approach might also be called for...
Your present algorithm is based on the assumption that "hashes are 'free.'" Unfortunately, when a hash grows into hundreds-of-thousands or millions of entries, it is no longer free. Instead, every single access runs the risk of a page fault. The application, and the system itself, slows to a crawl...
An "unexpectedly different" algorithm would write all those keys to a disk-file, then sort that file (on disk...), and count them. When a file is sorted, all of the occurences of any particular key-value are guaranteed to be adjacent. "Counting" them, therefore, requires no main-memory at all.
Yes... this is "how they did it with punched cards, even before digital computers existed." And... ... it still works.
(In fact, it can out-perform algorithms such as yours by a factor of thousands . . . )
let me try: so should i write to a new file with some delimiter, open the temp file using some split and then again wont i need to go to some ds to store the two dimensional values? the exception trace(around 100 lines) and the number of repetitions of that trace?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.