Unknown CPU usage - top and ps don't show the cause
So a few weeks ago a Linux server was having high CPU usage, but we weren't able to determine what program was causing it. The top and ps commands (and others) didn't show any programs using a huge amount of CPU resources, but %idle was at 0 and our monitoring software showed 100% usage. Yet no process was found by ps and top to explain it. The usage was caused by a DBA doing some work, but we never found out why we could not see the application causing the usage.
Now here I am several weeks later and I have the same problem, but this time it is on my Palm Pre. I upgraded to 1.2.1, and now it runs very slowly. A quick top in an SSH session shows: Code:
top - 00:42:34 up 1:06, 2 users, load average: 7.13, 7.42, 7.64 So, what is causing the high CPU usage? Why doesn't top and ps show it? |
I agree that the numbers don't "add up" ...
... but I get the strong impression: a) the high load average (you've got "7"; any chronic wait queue values over "2" is suspect) is definitely indicative of performance problems b) the high cpu utilization... ... coupled with the fact that no single program seems to be hogging the CPU... c) and the high memory usage ... that maybe there's a memory issue: that maybe the LunaSysMgr GUI is hogging most of your 256MB RAM... ... even though "idle", "I/O wait" and swap usage are (essentially) zero ... I suspect more memory might improve things. I also suspect that whatever your DBA was doing was also memory (vs CPU) intensive, and perhaps there might have been a memory issue there, too. IMHO .. PSM |
Memory causing high CPU? I don't get it, why would that happen?
|
Quote:
It is very common on a low-RAM system for the system to be very slow but with no serious CPU loading taking place, all because the system is constantly swapping to the drive instead of reading and writing RAM. Must of that time is spent in hardware wait states, not CPU time, so it doesn't show up in the CPU statistics, but the machine is very slow. |
84k swap used is trivial.
Still, watch the si and so columns of Code:
vmstat 5 |
Those %cpu figures for upstart, dbus, and pmsyslogd look high to me. I'd expect them to be close to or around 0 on a healthy system.
I'd be inclined to have a look through your logfiles for any repeated messages, check if some daemon or other keeps failing and being constantly restarted by upstart. P.S. Read the manpage for ps and look what %cpu actually means. It's not what you're expecting it to. |
I re-installed the OS on my Palm Pre to fix the problem on that, so I will look into that server tomorrow. Right now it is showing high iowait, so even though these have similar symptoms they seem to be different issues. When this started on the server, I don't recall seeing that high iowait though.
|
Is DMA working properly on the I/O controller?
|
Code:
vmstat 5 Code:
iostat -x 1 |
All times are GMT -5. The time now is 05:25 AM. |