LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-30-2011, 10:57 PM   #1
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Rep: Reputation: 0
How can I get maximum performance on a multi-processor machine?


I have a multi-threaded app using pThreads that runs great under Windows on my 4 core machine - all four cores get maxed out processing parts of a large file. I recompiled the same code to run on Red Hat linux on a 64 CPU machine - but from I can tell when it runs - it gets stuck on one core. "mpstat -P ALL" shows the cores are barely loaded. I have tried sched_affinity, sched_priority and SCHED_FIFO - nothing has helped. Any ideas on getting more performance?
 
Old 07-01-2011, 01:15 AM   #2
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Hi -

It sounds like your processing is largely CPU bound on Windows, and the CPU workload is equally partitioned among your available cores. Fair enough.

It *doesn't* sound like *any* of the CPUs are doing much work on Linux.

My first guess is that maybe you're doing I/O inefficiently on Linux: the program is spending more time waiting for data to process, than it is actually processing it.

An alternate guess is maybe you've maxed out RAM, you've started swapping ... and the system is doing more work swapping pages in and out than it is doing any processing work.

Either way, it sounds like you're somehow, for some reason, I/O bound on Linux.

Unless you see one CPU at near 100%, and the remaining CPUs idle, then you should probably be looking for some kind of I/O or memory bottleneck.

IMHO .. PSM
 
Old 07-01-2011, 01:21 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Have you tried it on only 4 cores on Linux ?.
You can boot the machine to only use 4 cores, or (depending on kernel level) use cgroups to limit the main task and children to a limited set (e.g. 4) of available cores.
 
Old 07-01-2011, 04:45 AM   #4
JZL240I-U
Senior Member
 
Registered: Apr 2003
Location: Germany
Distribution: openSuSE Tumbleweed-KDE, Mint 21, MX-21, Manjaro
Posts: 4,629

Rep: Reputation: Disabled
During re-compile, did you give the "-j64" option for the number of kernels available? (Or is that the parameter for the compilation itself? Dunno, didn't do any compiling recently...).
 
Old 07-01-2011, 08:49 AM   #5
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Original Poster
Rep: Reputation: 0
Thanks - the machine has 64Gb RAM, so I think I'm ok there - and if I was i/o bound I was expecting to see that in mpstat? There is a column for iowait, and it barely registers over 5% on each of the 64 processors. Hmmmm.... still looking

Quote:
Originally Posted by paulsm4 View Post
Hi -

It sounds like your processing is largely CPU bound on Windows, and the CPU workload is equally partitioned among your available cores. Fair enough.

It *doesn't* sound like *any* of the CPUs are doing much work on Linux.

My first guess is that maybe you're doing I/O inefficiently on Linux: the program is spending more time waiting for data to process, than it is actually processing it.

An alternate guess is maybe you've maxed out RAM, you've started swapping ... and the system is doing more work swapping pages in and out than it is doing any processing work.

Either way, it sounds like you're somehow, for some reason, I/O bound on Linux.

Unless you see one CPU at near 100%, and the remaining CPUs idle, then you should probably be looking for some kind of I/O or memory bottleneck.

IMHO .. PSM
 
Old 07-01-2011, 09:04 AM   #6
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Original Poster
Rep: Reputation: 0
Thanks - I don't have direct access - this is running RHEL, so we will look into cgroups, that's a good idea

Quote:
Originally Posted by syg00 View Post
Have you tried it on only 4 cores on Linux ?.
You can boot the machine to only use 4 cores, or (depending on kernel level) use cgroups to limit the main task and children to a limited set (e.g. 4) of available cores.
 
Old 07-01-2011, 09:06 AM   #7
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Original Poster
Rep: Reputation: 0
SO I don't readily see any descriptions for -j option. Any more info on that? I'll try anything - note that I cannot re-compile the kernel, this is running RHEL on a client box - Thanks!

Quote:
Originally Posted by JZL240I-U View Post
During re-compile, did you give the "-j64" option for the number of kernels available? (Or is that the parameter for the compilation itself? Dunno, didn't do any compiling recently...).
 
Old 07-01-2011, 09:18 AM   #8
JZL240I-U
Senior Member
 
Registered: Apr 2003
Location: Germany
Distribution: openSuSE Tumbleweed-KDE, Mint 21, MX-21, Manjaro
Posts: 4,629

Rep: Reputation: Disabled
No, not re-compiling the kernel, I meant this:
Quote:
...I recompiled the same code to run on Red Hat linux on a 64 CPU machine...
The -j option is part of either ./configure or make which pass it to gcc.
 
Old 07-01-2011, 10:15 AM   #9
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Original Poster
Rep: Reputation: 0
hmmm... From what I read the -j option tells gcc to compile on more than one processor. I don't have any problem compiling the app - it's running the app that is the problem. I'm trying to get the app to run on all 64 processors at once, not the compiler. Or did I misunderstand something? Thanks for the reply anyway -

Quote:
Originally Posted by JZL240I-U View Post
No, not re-compiling the kernel, I meant this:
The -j option is part of either ./configure or make which pass it to gcc.
 
Old 07-01-2011, 10:31 AM   #10
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Q: Does "top" or any of your other tools show high CPU utilization for 1 CPU, and the others idle?
For a 64-CPU system and a truly parallelized application, CPU utilization *should* be allocated equally for each active thread.

Q: Exactly how much "work" is being allocated to the 64 CPUs?
It sounds like the answer - for whatever reason - is "not much".

Q: Maybe this system is just such a screamer that all the work gets done without any CPU even breaking a sweat.
Who knows - maybe this is the case. If so: Relax, Be Happy

Q: Or maybe there's some kind of bottleneck occurring that's *preventing* the CPUs from getting all the work in a timely manner. That's what I was suggesting with "memory" and "I/O".

Suggestion:
* Write a quick'n'dirty test program that's all calculation (CPU-bound; no I/O) and see how it behaves.
 
Old 07-01-2011, 12:12 PM   #11
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Original Poster
Rep: Reputation: 0
Thanks - exactly my thinking. I am getting "top" results today. The app is taking hours to run so I know there's lots of work for each CPU. I am worried about a bottleneck, so I am planning a test app. What's the laziest way to tie up a CPU? I'm thinking compute Pi or something - just looking for a trick. I'll post more results -

Quote:
Originally Posted by paulsm4 View Post
Q: Does "top" or any of your other tools show high CPU utilization for 1 CPU, and the others idle?
For a 64-CPU system and a truly parallelized application, CPU utilization *should* be allocated equally for each active thread.

Q: Exactly how much "work" is being allocated to the 64 CPUs?
It sounds like the answer - for whatever reason - is "not much".

Q: Maybe this system is just such a screamer that all the work gets done without any CPU even breaking a sweat.
Who knows - maybe this is the case. If so: Relax, Be Happy

Q: Or maybe there's some kind of bottleneck occurring that's *preventing* the CPUs from getting all the work in a timely manner. That's what I was suggesting with "memory" and "I/O".

Suggestion:
* Write a quick'n'dirty test program that's all calculation (CPU-bound; no I/O) and see how it behaves.
 
Old 07-01-2011, 09:07 PM   #12
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Have a look at latencytop - not designed for this specifically but might help you find any blocking.
 
Old 07-01-2011, 11:05 PM   #13
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Could you post the compiler options and such that you've used building
your program?


Cheers,
Tink
 
Old 07-02-2011, 04:21 PM   #14
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,974

Rep: Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623
I'd build a single purpose OS. Any distro is just too generic.

Built it from scratch to match your use and don't install anything you don't need.
 
Old 07-02-2011, 09:40 PM   #15
michael@actrix
Member
 
Registered: Jul 2003
Location: New Zealand
Distribution: OpenSUSE Tumbleweed
Posts: 68
Blog Entries: 1

Rep: Reputation: 20
Try htop - it displays a bar for each CPU - see if they're all high or not. There is also iotop for io.
md5sum /dev/zero
will keep a CPU busy indefinitely. Run up several and use htop to see if all the CPU's get going.

Quote:
Originally Posted by imperialbeachdude View Post
Thanks - exactly my thinking. I am getting "top" results today. The app is taking hours to run so I know there's lots of work for each CPU. I am worried about a bottleneck, so I am planning a test app. What's the laziest way to tie up a CPU? I'm thinking compute Pi or something - just looking for a trick. I'll post more results -
 
  


Reply

Tags
maximize, performance, threads



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Five Maximum Performance Tips for SDK 3.0 LXer Syndicated Linux News 0 06-19-2008 10:21 PM
how can two process can be scheduled in two different processor in multi processor ? bishalpoudyal Programming 4 08-31-2006 02:22 PM
How to compile apps to achieve maximum performance? kornerr Linux - General 14 06-20-2005 02:00 PM
Installing Linux on a dual processor machine (only one processor detected) rocordial Linux - Hardware 1 11-27-2004 02:16 AM
Maximum Number of Directory Entries & Performance aig Linux - General 1 07-09-2004 07:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration