LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Responsive system/High load average and hanging ps (https://www.linuxquestions.org/questions/linux-general-1/responsive-system-high-load-average-and-hanging-ps-744606/)

bagpussnz 08-02-2009 04:50 PM

Responsive system/High load average and hanging ps
 
Hi,
I have a server running kernel...

2.6.8-24.14-smp #1 SMP Tue Mar 29 09:27:43 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux

This server runs intense java build services (using maven).

What I am seeing is after a short while of booting, the machine starts getting a high load average (the highest I have seen is 3600).
However, the machine is still responsive.

When I do a ps -ef (or find through /proc) the command hangs (and cannot be terminated).

The only way to reboot is to pull the power.

Does this sound familiar to anyone? How can I diagnose?

top - 09:21:48 up 3 days, 1:20, 12 users, load average: 420.98, 414.12, 396.64
Tasks: 698 total, 1 running, 697 sleeping, 0 stopped, 0 zombie
Cpu(s): 21.9% us, 6.6% sy, 0.0% ni, 61.9% id, 9.3% wa, 0.0% hi, 0.3% si
Mem: 2055324k total, 2024640k used, 30684k free, 187044k buffers
Swap: 1052216k total, 280400k used, 771816k free, 1108756k cached


Regards,
Ian Collins.

paulsm4 08-02-2009 05:57 PM

Dude - you need more RAM. Fast!

If your system workload is such that the runlength queue chronically exceeds 2.0 ("two's a crowd" is a true statement!), that's an indication your system *might* need more horsepower.

Your runlength queue exceeds 400!

If "swap used" is chronically non-zero, that's a strong indication you need more RAM and/or need to throttle a "memory hog" process and/or need to break some of your "memory hogs" out to a separate server.

Moreover, "memory swapping" is certainly contributing to (and might in fact be the root cause) of your high run queue.

PS:
I have *never* seen a load average of "420.98". Never!

But I *have* seen systems visibly impacted with the load average as low as 2.0 - 5.0. Honest.

You need more RAM, you need to consider "throttling" your app(s) (perhaps with custom JVM switches), and should consider faster/bigger/more powerful systems, and you should also consider partitioning your workload across multiple servers.

IMHO .. PSM

Retrievil_Knievil 06-01-2010 02:09 PM

Similar scenario - different solution
 
Hi,

Found this thread while I was having a similar problem, and found the solution to my own, anyway. I think such a high load average is more a sign of something being broken than just missing RAM.

On the system I was looking into the load average was well into the 40-50's, with only four cores, so something was up.

Turned out it was a nfs mount that was gone off line, and a lot of processes were stalled due to this halting any process that tries to list the drives or the folder containing the mount.

Upon rebooting the host with the unresponsive nfs mount and remounting it (the client complained that is was still mounted, but fixed the problem, and unmounting it would not work cleanly, since the mount was busy) everything went smoothly.


All times are GMT -5. The time now is 11:51 PM.