How can I get maximum performance on a multi-processor machine?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How can I get maximum performance on a multi-processor machine?
I have a multi-threaded app using pThreads that runs great under Windows on my 4 core machine - all four cores get maxed out processing parts of a large file. I recompiled the same code to run on Red Hat linux on a 64 CPU machine - but from I can tell when it runs - it gets stuck on one core. "mpstat -P ALL" shows the cores are barely loaded. I have tried sched_affinity, sched_priority and SCHED_FIFO - nothing has helped. Any ideas on getting more performance?
It sounds like your processing is largely CPU bound on Windows, and the CPU workload is equally partitioned among your available cores. Fair enough.
It *doesn't* sound like *any* of the CPUs are doing much work on Linux.
My first guess is that maybe you're doing I/O inefficiently on Linux: the program is spending more time waiting for data to process, than it is actually processing it.
An alternate guess is maybe you've maxed out RAM, you've started swapping ... and the system is doing more work swapping pages in and out than it is doing any processing work.
Either way, it sounds like you're somehow, for some reason, I/O bound on Linux.
Unless you see one CPU at near 100%, and the remaining CPUs idle, then you should probably be looking for some kind of I/O or memory bottleneck.
Have you tried it on only 4 cores on Linux ?.
You can boot the machine to only use 4 cores, or (depending on kernel level) use cgroups to limit the main task and children to a limited set (e.g. 4) of available cores.
Distribution: openSuSE Tumbleweed-KDE, Mint 21, MX-21, Manjaro
Posts: 4,629
Rep:
During re-compile, did you give the "-j64" option for the number of kernels available? (Or is that the parameter for the compilation itself? Dunno, didn't do any compiling recently...).
Thanks - the machine has 64Gb RAM, so I think I'm ok there - and if I was i/o bound I was expecting to see that in mpstat? There is a column for iowait, and it barely registers over 5% on each of the 64 processors. Hmmmm.... still looking
Quote:
Originally Posted by paulsm4
Hi -
It sounds like your processing is largely CPU bound on Windows, and the CPU workload is equally partitioned among your available cores. Fair enough.
It *doesn't* sound like *any* of the CPUs are doing much work on Linux.
My first guess is that maybe you're doing I/O inefficiently on Linux: the program is spending more time waiting for data to process, than it is actually processing it.
An alternate guess is maybe you've maxed out RAM, you've started swapping ... and the system is doing more work swapping pages in and out than it is doing any processing work.
Either way, it sounds like you're somehow, for some reason, I/O bound on Linux.
Unless you see one CPU at near 100%, and the remaining CPUs idle, then you should probably be looking for some kind of I/O or memory bottleneck.
Thanks - I don't have direct access - this is running RHEL, so we will look into cgroups, that's a good idea
Quote:
Originally Posted by syg00
Have you tried it on only 4 cores on Linux ?.
You can boot the machine to only use 4 cores, or (depending on kernel level) use cgroups to limit the main task and children to a limited set (e.g. 4) of available cores.
SO I don't readily see any descriptions for -j option. Any more info on that? I'll try anything - note that I cannot re-compile the kernel, this is running RHEL on a client box - Thanks!
Quote:
Originally Posted by JZL240I-U
During re-compile, did you give the "-j64" option for the number of kernels available? (Or is that the parameter for the compilation itself? Dunno, didn't do any compiling recently...).
hmmm... From what I read the -j option tells gcc to compile on more than one processor. I don't have any problem compiling the app - it's running the app that is the problem. I'm trying to get the app to run on all 64 processors at once, not the compiler. Or did I misunderstand something? Thanks for the reply anyway -
Quote:
Originally Posted by JZL240I-U
No, not re-compiling the kernel, I meant this:
The -j option is part of either ./configure or make which pass it to gcc.
Q: Does "top" or any of your other tools show high CPU utilization for 1 CPU, and the others idle?
For a 64-CPU system and a truly parallelized application, CPU utilization *should* be allocated equally for each active thread.
Q: Exactly how much "work" is being allocated to the 64 CPUs?
It sounds like the answer - for whatever reason - is "not much".
Q: Maybe this system is just such a screamer that all the work gets done without any CPU even breaking a sweat.
Who knows - maybe this is the case. If so: Relax, Be Happy
Q: Or maybe there's some kind of bottleneck occurring that's *preventing* the CPUs from getting all the work in a timely manner. That's what I was suggesting with "memory" and "I/O".
Suggestion:
* Write a quick'n'dirty test program that's all calculation (CPU-bound; no I/O) and see how it behaves.
Thanks - exactly my thinking. I am getting "top" results today. The app is taking hours to run so I know there's lots of work for each CPU. I am worried about a bottleneck, so I am planning a test app. What's the laziest way to tie up a CPU? I'm thinking compute Pi or something - just looking for a trick. I'll post more results -
Quote:
Originally Posted by paulsm4
Q: Does "top" or any of your other tools show high CPU utilization for 1 CPU, and the others idle?
For a 64-CPU system and a truly parallelized application, CPU utilization *should* be allocated equally for each active thread.
Q: Exactly how much "work" is being allocated to the 64 CPUs?
It sounds like the answer - for whatever reason - is "not much".
Q: Maybe this system is just such a screamer that all the work gets done without any CPU even breaking a sweat.
Who knows - maybe this is the case. If so: Relax, Be Happy
Q: Or maybe there's some kind of bottleneck occurring that's *preventing* the CPUs from getting all the work in a timely manner. That's what I was suggesting with "memory" and "I/O".
Suggestion:
* Write a quick'n'dirty test program that's all calculation (CPU-bound; no I/O) and see how it behaves.
Try htop - it displays a bar for each CPU - see if they're all high or not. There is also iotop for io.
md5sum /dev/zero
will keep a CPU busy indefinitely. Run up several and use htop to see if all the CPU's get going.
Quote:
Originally Posted by imperialbeachdude
Thanks - exactly my thinking. I am getting "top" results today. The app is taking hours to run so I know there's lots of work for each CPU. I am worried about a bottleneck, so I am planning a test app. What's the laziest way to tie up a CPU? I'm thinking compute Pi or something - just looking for a trick. I'll post more results -
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.