LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   How can I get maximum performance on a multi-processor machine? (https://www.linuxquestions.org/questions/linux-software-2/how-can-i-get-maximum-performance-on-a-multi-processor-machine-889296/)

imperialbeachdude 06-30-2011 10:57 PM

How can I get maximum performance on a multi-processor machine?
 
I have a multi-threaded app using pThreads that runs great under Windows on my 4 core machine - all four cores get maxed out processing parts of a large file. I recompiled the same code to run on Red Hat linux on a 64 CPU machine - but from I can tell when it runs - it gets stuck on one core. "mpstat -P ALL" shows the cores are barely loaded. I have tried sched_affinity, sched_priority and SCHED_FIFO - nothing has helped. Any ideas on getting more performance?

paulsm4 07-01-2011 01:15 AM

Hi -

It sounds like your processing is largely CPU bound on Windows, and the CPU workload is equally partitioned among your available cores. Fair enough.

It *doesn't* sound like *any* of the CPUs are doing much work on Linux.

My first guess is that maybe you're doing I/O inefficiently on Linux: the program is spending more time waiting for data to process, than it is actually processing it.

An alternate guess is maybe you've maxed out RAM, you've started swapping ... and the system is doing more work swapping pages in and out than it is doing any processing work.

Either way, it sounds like you're somehow, for some reason, I/O bound on Linux.

Unless you see one CPU at near 100%, and the remaining CPUs idle, then you should probably be looking for some kind of I/O or memory bottleneck.

IMHO .. PSM

syg00 07-01-2011 01:21 AM

Have you tried it on only 4 cores on Linux ?.
You can boot the machine to only use 4 cores, or (depending on kernel level) use cgroups to limit the main task and children to a limited set (e.g. 4) of available cores.

JZL240I-U 07-01-2011 04:45 AM

During re-compile, did you give the "-j64" option for the number of kernels available? (Or is that the parameter for the compilation itself? Dunno, didn't do any compiling recently...).

imperialbeachdude 07-01-2011 08:49 AM

Thanks - the machine has 64Gb RAM, so I think I'm ok there - and if I was i/o bound I was expecting to see that in mpstat? There is a column for iowait, and it barely registers over 5% on each of the 64 processors. Hmmmm.... still looking

Quote:

Originally Posted by paulsm4 (Post 4401051)
Hi -

It sounds like your processing is largely CPU bound on Windows, and the CPU workload is equally partitioned among your available cores. Fair enough.

It *doesn't* sound like *any* of the CPUs are doing much work on Linux.

My first guess is that maybe you're doing I/O inefficiently on Linux: the program is spending more time waiting for data to process, than it is actually processing it.

An alternate guess is maybe you've maxed out RAM, you've started swapping ... and the system is doing more work swapping pages in and out than it is doing any processing work.

Either way, it sounds like you're somehow, for some reason, I/O bound on Linux.

Unless you see one CPU at near 100%, and the remaining CPUs idle, then you should probably be looking for some kind of I/O or memory bottleneck.

IMHO .. PSM


imperialbeachdude 07-01-2011 09:04 AM

Thanks - I don't have direct access - this is running RHEL, so we will look into cgroups, that's a good idea

Quote:

Originally Posted by syg00 (Post 4401055)
Have you tried it on only 4 cores on Linux ?.
You can boot the machine to only use 4 cores, or (depending on kernel level) use cgroups to limit the main task and children to a limited set (e.g. 4) of available cores.


imperialbeachdude 07-01-2011 09:06 AM

SO I don't readily see any descriptions for -j option. Any more info on that? I'll try anything - note that I cannot re-compile the kernel, this is running RHEL on a client box - Thanks!

Quote:

Originally Posted by JZL240I-U (Post 4401206)
During re-compile, did you give the "-j64" option for the number of kernels available? (Or is that the parameter for the compilation itself? Dunno, didn't do any compiling recently...).


JZL240I-U 07-01-2011 09:18 AM

No, not re-compiling the kernel, I meant this:
Quote:

...I recompiled the same code to run on Red Hat linux on a 64 CPU machine...
The -j option is part of either ./configure or make which pass it to gcc.

imperialbeachdude 07-01-2011 10:15 AM

hmmm... From what I read the -j option tells gcc to compile on more than one processor. I don't have any problem compiling the app - it's running the app that is the problem. I'm trying to get the app to run on all 64 processors at once, not the compiler. Or did I misunderstand something? Thanks for the reply anyway -

Quote:

Originally Posted by JZL240I-U (Post 4401389)
No, not re-compiling the kernel, I meant this:
The -j option is part of either ./configure or make which pass it to gcc.


paulsm4 07-01-2011 10:31 AM

Q: Does "top" or any of your other tools show high CPU utilization for 1 CPU, and the others idle?
For a 64-CPU system and a truly parallelized application, CPU utilization *should* be allocated equally for each active thread.

Q: Exactly how much "work" is being allocated to the 64 CPUs?
It sounds like the answer - for whatever reason - is "not much".

Q: Maybe this system is just such a screamer that all the work gets done without any CPU even breaking a sweat.
Who knows - maybe this is the case. If so: Relax, Be Happy :)

Q: Or maybe there's some kind of bottleneck occurring that's *preventing* the CPUs from getting all the work in a timely manner. That's what I was suggesting with "memory" and "I/O".

Suggestion:
* Write a quick'n'dirty test program that's all calculation (CPU-bound; no I/O) and see how it behaves.

imperialbeachdude 07-01-2011 12:12 PM

Thanks - exactly my thinking. I am getting "top" results today. The app is taking hours to run so I know there's lots of work for each CPU. I am worried about a bottleneck, so I am planning a test app. What's the laziest way to tie up a CPU? I'm thinking compute Pi or something - just looking for a trick. I'll post more results -

Quote:

Originally Posted by paulsm4 (Post 4401470)
Q: Does "top" or any of your other tools show high CPU utilization for 1 CPU, and the others idle?
For a 64-CPU system and a truly parallelized application, CPU utilization *should* be allocated equally for each active thread.

Q: Exactly how much "work" is being allocated to the 64 CPUs?
It sounds like the answer - for whatever reason - is "not much".

Q: Maybe this system is just such a screamer that all the work gets done without any CPU even breaking a sweat.
Who knows - maybe this is the case. If so: Relax, Be Happy :)

Q: Or maybe there's some kind of bottleneck occurring that's *preventing* the CPUs from getting all the work in a timely manner. That's what I was suggesting with "memory" and "I/O".

Suggestion:
* Write a quick'n'dirty test program that's all calculation (CPU-bound; no I/O) and see how it behaves.


syg00 07-01-2011 09:07 PM

Have a look at latencytop - not designed for this specifically but might help you find any blocking.

Tinkster 07-01-2011 11:05 PM

Could you post the compiler options and such that you've used building
your program?


Cheers,
Tink

jefro 07-02-2011 04:21 PM

I'd build a single purpose OS. Any distro is just too generic.

Built it from scratch to match your use and don't install anything you don't need.

michael@actrix 07-02-2011 09:40 PM

Try htop - it displays a bar for each CPU - see if they're all high or not. There is also iotop for io.
md5sum /dev/zero
will keep a CPU busy indefinitely. Run up several and use htop to see if all the CPU's get going.

Quote:

Originally Posted by imperialbeachdude (Post 4401551)
Thanks - exactly my thinking. I am getting "top" results today. The app is taking hours to run so I know there's lots of work for each CPU. I am worried about a bottleneck, so I am planning a test app. What's the laziest way to tie up a CPU? I'm thinking compute Pi or something - just looking for a trick. I'll post more results -



All times are GMT -5. The time now is 10:35 AM.