LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Unknown CPU usage - top and ps don't show the cause (https://www.linuxquestions.org/questions/linux-general-1/unknown-cpu-usage-top-and-ps-dont-show-the-cause-761199/)

gimpy530 10-11-2009 02:13 PM

Unknown CPU usage - top and ps don't show the cause
 
So a few weeks ago a Linux server was having high CPU usage, but we weren't able to determine what program was causing it. The top and ps commands (and others) didn't show any programs using a huge amount of CPU resources, but %idle was at 0 and our monitoring software showed 100% usage. Yet no process was found by ps and top to explain it. The usage was caused by a DBA doing some work, but we never found out why we could not see the application causing the usage.

Now here I am several weeks later and I have the same problem, but this time it is on my Palm Pre. I upgraded to 1.2.1, and now it runs very slowly. A quick top in an SSH session shows:
Code:

top - 00:42:34 up  1:06,  2 users,  load average: 7.13, 7.42, 7.64
Tasks:  94 total,  8 running,  86 sleeping,  0 stopped,  0 zombie
Cpu(s): 77.9%us, 21.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:    245036k total,  231908k used,    13128k free,    15500k buffers
Swap:  131064k total,      84k used,  130980k free,    46760k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                   
 1019 root      20  0  2932 1492  656 S 11.7  0.6  6:54.13 dbus-daemon                               
    1 root      20  0  1960 1012  696 R  6.5  0.4  3:50.56 upstart                                   
  922 root      20  0  2896 1276  964 S  5.2  0.5  3:28.35 pmsyslogd                                 
14879 root      20  0  4688 1672 1356 R  1.6  0.7  0:00.05 luna-helper                               
14881 root      20  0  4692 1612 1348 R  1.3  0.7  0:00.04 luna-helper                               
11161 root      20  0  2476 1160  900 R  1.0  0.5  0:01.73 top                                       
14883 root      20  0  4556  824  656 R  0.6  0.3  0:00.02 luna-helper                               
 1270 root      19  -1  217m  87m  15m S  0.3 36.4  3:23.93 LunaSysMgr                                 
14885 root      20  0  4584  684  556 R  0.3  0.3  0:00.01 luna-helper                               
14887 root      20  0  4584  684  556 R  0.3  0.3  0:00.01 luna-helper                               
14893 root      20  0  2452 1024  812 S  0.3  0.4  0:01.04 dropbear                                   
    2 root      15  -5    0    0    0 S  0.0  0.0  0:00.00 kthreadd                                   
    3 root      15  -5    0    0    0 S  0.0  0.0  0:00.18 ksoftirqd/0                               
    4 root      RT  -5    0    0    0 S  0.0  0.0  0:00.02 watchdog/0                                 
    5 root      15  -5    0    0    0 S  0.0  0.0  0:01.78 events/0                                   
    6 root      15  -5    0    0    0 S  0.0  0.0  0:00.03 khelper                                   
  98 root      15  -5    0    0    0 S  0.0  0.0  0:00.12 kblockd/0

Now, assuming I'm not just being stupid (which is always a possibility) the first two values on the CPU line show a total of 100% usage, yet no process shows that. The ps command shows a similar result.

So, what is causing the high CPU usage? Why doesn't top and ps show it?

paulsm4 10-11-2009 10:33 PM

I agree that the numbers don't "add up" ...

... but I get the strong impression:

a) the high load average (you've got "7"; any chronic wait queue values over "2" is suspect) is definitely indicative of performance problems

b) the high cpu utilization...
... coupled with the fact that no single program seems to be hogging the CPU...

c) and the high memory usage ...

that maybe there's a memory issue: that maybe the LunaSysMgr GUI is hogging most of your 256MB RAM...

... even though "idle", "I/O wait" and swap usage are (essentially) zero

... I suspect more memory might improve things.

I also suspect that whatever your DBA was doing was also memory (vs CPU) intensive, and perhaps there might have been a memory issue there, too.

IMHO .. PSM

gimpy530 10-12-2009 05:42 PM

Memory causing high CPU? I don't get it, why would that happen?

lutusp 10-12-2009 07:18 PM

Quote:

Originally Posted by gimpy530 (Post 3717008)
Memory causing high CPU? I don't get it, why would that happen?

The reason low RAM causes high CPU loading is because the system spends a lot of time in swapping operations. This also makes processes take longer in terms of wall time, even when the CPU time statistics look good -- because the system has to wait on drive storage rather than RAM storage.

It is very common on a low-RAM system for the system to be very slow but with no serious CPU loading taking place, all because the system is constantly swapping to the drive instead of reading and writing RAM.

Must of that time is spent in hardware wait states, not CPU time, so it doesn't show up in the CPU statistics, but the machine is very slow.

AlucardZero 10-12-2009 07:51 PM

84k swap used is trivial.

Still, watch the si and so columns of
Code:

vmstat 5
.

GazL 10-13-2009 06:45 AM

Those %cpu figures for upstart, dbus, and pmsyslogd look high to me. I'd expect them to be close to or around 0 on a healthy system.

I'd be inclined to have a look through your logfiles for any repeated messages, check if some daemon or other keeps failing and being constantly restarted by upstart.


P.S. Read the manpage for ps and look what %cpu actually means. It's not what you're expecting it to.

gimpy530 10-14-2009 06:16 PM

I re-installed the OS on my Palm Pre to fix the problem on that, so I will look into that server tomorrow. Right now it is showing high iowait, so even though these have similar symptoms they seem to be different issues. When this started on the server, I don't recall seeing that high iowait though.

sundialsvcs 10-14-2009 10:40 PM

Is DMA working properly on the I/O controller?

gimpy530 10-15-2009 01:32 PM

Code:

vmstat 5

procs                      memory      swap          io    system        cpu
 r  b  swpd  free  buff  cache  si  so    bi    bo  in    cs us sy wa id
 0  2  13680  17508  3856 7231880    0    0    2    22  14    10 13  2 20  5
 0  2  13680  17520  3864 7231852    0    0  5859  868  987  2429 16  2 82  0
 2  1  13680  17368  3272 7232596    0    0  5538  288  946  2443  5  2 93  0
 0  2  13680  17388  3284 7232564    0    0  5088  271  869  2329  4  2 94  0
 2  2  13680  17548  3176 7232508    0    0  5158  251  882  2365  5  2 93  0
 0  2  13680  17476  3124 7232644    0    0  5261  162  897  2357  5  3 93  0
 0  2  13680  17468  3012 7232756    0    0  4920  349  849  2276  5  2 94  0
 1  2  13680  17492  3040 7232704    0    0  4513  263  778  2155  4  1 95  0

Code:

iostat -x 1


avg-cpu:  %user  %nice    %sys  %idle
          78.57    0.00  21.43    0.00

Device:    rrqm/s wrqm/s  r/s  w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz  await  svctm  %util
/dev/sda    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda1    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda2    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda3    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda5    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda6    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda7    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda8    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda9    0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda10  0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda11  0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda12  0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda13  0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sda14  0.00  0.00  0.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
/dev/sdb  6514.29  85.71 10442.86 85.71    0.00 1371.43    0.00  685.71    0.13  395.71    3.74  1.34 1414.29
/dev/sdb1  6514.29  85.71 10442.86 85.71    0.00 1371.43    0.00  685.71    0.13  395.71    3.74  1.34 1414.29

How do I check if DMA is working?


All times are GMT -5. The time now is 05:25 AM.