LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Severe performance degradation in 13.37 (https://www.linuxquestions.org/questions/slackware-14/severe-performance-degradation-in-13-37-a-886938/)

Roberto2011 06-17-2011 03:00 PM

Severe performance degradation in 13.37
 
Hello everybody,

Though I'm new to the list, I've been using Slackware since number
of years. Just an educated user, not a wizard. So, here comes my
question.

Recently we bought a dual 12-core Opteron machine (Supermicro H8DGi
board). Installed Slackware 13.37 and performed some few tests. We
observed that performance quickly degraded as the box became loaded.
For instance, a lonely task may took "t", but when running 24 of them
at the same time (fully loaded box), it may took 2 to 3 times longer.

From some test we did (tinkering with BIOS, moving memory modules, etc)
we came to the conclusion the problem was due to a terrible memory
managment.

Finally we solved the problem by recompiling the kernel and taking the
.config file from OpenSuse. Thus, there must be something to be
tweaked in the standard Slackware .config file, but have no idea of
what.

Any suggestion would be very much appreciated.

Regards,

Roberto

Alien Bob 06-17-2011 03:09 PM

If you post a link to that OpenSuse kernel .config file then people could have a look.

Eric

H_TeXMeX_H 06-17-2011 03:12 PM

It might be 'CONFIG_SCHED_AUTOGROUP=y' that might improve performance, but may be other things as well.

Skaperen 06-17-2011 03:17 PM

Which kernel version did you try? Or did you stay with the 2.6.37.6 from 13.37? You might give 2.6.38.4 from the "testing" directory, or 2.6.38.7 from -current, a try. I'm running on 2.6.38.4 at home.

With hyper-threading, multi-core, multi-socket, and triple-channel memory, you have 4 opportunities to cause things to be slower.

Hyper-threading pairs 2 (or maybe more) CPU contexts into 1 logic unit. So when both of those threads have running tasks, they are each half speed. I don't know about AMD, yet, but on Intel CPUs I have, they are 4 or 6 cores depending on which one, but look like 8 or 12 CPUs. If run 8 or 12 tasks, they run at half speed. I'm curious how the kernel might know to arrange running tasks correctly to get just 4 or 6 processes into separate cores.

Multi-core creates contention for the memory path coming off the chip.

Multi-socket usually means memory connected to just one or the other socket ends up slowing things down to "reach over" and access memory from the other socket. I'd hope the kernel could arrange page swap ins to the proper memory for the task being dispatched. But where do share memory pages get placed?

Triple channel memory can increase speed. But if the RAM population doesn't match (and it won't if total RAM is a power of two like we normally do), it won't run as fast.

I don't know if any of this is your problem. There are so many things it could be.

Martinus2u 06-17-2011 04:57 PM

since you compile your own kernels, you might wanna try the BFS scheduler, part of the CK patch set. Latest:

http://ck-hack.blogspot.com/2011/06/...-bfs-0406.html

With 24 logical CPUs, Con might be quite interested in your feedback. ;)

Pixxt 06-19-2011 08:54 AM

Just off top of my head since you have a two socket AMD system with Hyper Transport interconnects you have a NUMA sytem. So in the kernel config try enabling NUMA support. I think that may be the cause of the performance degradation on your system with Slackware default Kernel and/or .config

Pixxt 06-19-2011 08:56 AM

Quote:

Originally Posted by Skaperen (Post 4388861)

With hyper-threading, multi-core, multi-socket, and triple-channel memory, you have 4 opportunities to cause things to be slower.

Hyper-threading pairs 2 (or maybe more) CPU contexts into 1 logic unit. So when both of those threads have running tasks, they are each half speed. I don't know about AMD, yet, but on Intel CPUs I have, they are 4 or 6 cores depending on which one, but look like 8 or 12 CPUs. If run 8 or 12 tasks, they run at half speed. I'm curious how the kernel might know to arrange running tasks correctly to get just 4 or 6 processes into separate cores.

Amd chips do not HyperThreading or SMT as of yet......

onebuck 06-19-2011 09:07 AM

Hi,

I'm with Alien_Bob on this. Post a link to the '.config' file used. Just shooting in the dark without all the information.

Roberto2011 06-22-2011 07:57 AM

Thank you guys for your suggestions ;-) ; I was pushed to dig a little further into the
problem.
Pixxt hit the nail: it was the NUMA stuff, disabled by default in Slackware's kernel.
In fact, I suspected of that after making a diff between the two .config files, but
was unable to try until recently.
Cheers.


All times are GMT -5. The time now is 10:42 AM.