Severe performance degradation in 13.37
Hello everybody,
Though I'm new to the list, I've been using Slackware since number of years. Just an educated user, not a wizard. So, here comes my question. Recently we bought a dual 12-core Opteron machine (Supermicro H8DGi board). Installed Slackware 13.37 and performed some few tests. We observed that performance quickly degraded as the box became loaded. For instance, a lonely task may took "t", but when running 24 of them at the same time (fully loaded box), it may took 2 to 3 times longer. From some test we did (tinkering with BIOS, moving memory modules, etc) we came to the conclusion the problem was due to a terrible memory managment. Finally we solved the problem by recompiling the kernel and taking the .config file from OpenSuse. Thus, there must be something to be tweaked in the standard Slackware .config file, but have no idea of what. Any suggestion would be very much appreciated. Regards, Roberto |
If you post a link to that OpenSuse kernel .config file then people could have a look.
Eric |
It might be 'CONFIG_SCHED_AUTOGROUP=y' that might improve performance, but may be other things as well.
|
Which kernel version did you try? Or did you stay with the 2.6.37.6 from 13.37? You might give 2.6.38.4 from the "testing" directory, or 2.6.38.7 from -current, a try. I'm running on 2.6.38.4 at home.
With hyper-threading, multi-core, multi-socket, and triple-channel memory, you have 4 opportunities to cause things to be slower. Hyper-threading pairs 2 (or maybe more) CPU contexts into 1 logic unit. So when both of those threads have running tasks, they are each half speed. I don't know about AMD, yet, but on Intel CPUs I have, they are 4 or 6 cores depending on which one, but look like 8 or 12 CPUs. If run 8 or 12 tasks, they run at half speed. I'm curious how the kernel might know to arrange running tasks correctly to get just 4 or 6 processes into separate cores. Multi-core creates contention for the memory path coming off the chip. Multi-socket usually means memory connected to just one or the other socket ends up slowing things down to "reach over" and access memory from the other socket. I'd hope the kernel could arrange page swap ins to the proper memory for the task being dispatched. But where do share memory pages get placed? Triple channel memory can increase speed. But if the RAM population doesn't match (and it won't if total RAM is a power of two like we normally do), it won't run as fast. I don't know if any of this is your problem. There are so many things it could be. |
since you compile your own kernels, you might wanna try the BFS scheduler, part of the CK patch set. Latest:
http://ck-hack.blogspot.com/2011/06/...-bfs-0406.html With 24 logical CPUs, Con might be quite interested in your feedback. ;) |
Just off top of my head since you have a two socket AMD system with Hyper Transport interconnects you have a NUMA sytem. So in the kernel config try enabling NUMA support. I think that may be the cause of the performance degradation on your system with Slackware default Kernel and/or .config
|
Quote:
|
Hi,
I'm with Alien_Bob on this. Post a link to the '.config' file used. Just shooting in the dark without all the information. |
Thank you guys for your suggestions ;-) ; I was pushed to dig a little further into the
problem. Pixxt hit the nail: it was the NUMA stuff, disabled by default in Slackware's kernel. In fact, I suspected of that after making a diff between the two .config files, but was unable to try until recently. Cheers. |
All times are GMT -5. The time now is 10:42 AM. |