Contradictory Output In 'top' And 'ps'

forbin · 04-23-2010, 05:29 PM

Look at proccessor #7 in the follwing output from 'top'. It shows 78% idle. Fine, but then look at the first process in the list, which shows it at 100% CPU running on processor #7.

top - 15:16:59 up 55 days, 11:25, 4 users, load average: 1.43, 1.56, 1.52
Tasks: 361 total, 1 running, 358 sleeping, 0 stopped, 2 zombie
Cpu0 : 11.6%us, 2.3%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.3%hi, 1.7%si, 0.0%st
Cpu1 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 59.3%us, 0.3%sy, 0.0%ni, 40.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 2.0%us, 0.3%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.7%us, 1.7%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 18.5%us, 0.7%sy, 0.0%ni, 80.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 21.3%us, 0.3%sy, 0.0%ni, 78.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24942760k total, 23152296k used, 1790464k free, 317332k buffers
Swap: 2031608k total, 300k used, 2031308k free, 3572272k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
2986 site024 17 0 256m 92m 31m S 100 0.4 38:13.87 7 java
5897 site014 18 0 254m 143m 30m S 2 0.6 10:28.13 0 java
9793 site039 18 0 287m 180m 33m S 1 0.7 131:15.04 5 java
17426 site057 25 0 454m 336m 31m S 1 1.4 10:48.94 6 java
19514 site040 17 0 270m 135m 31m S 1 0.6 6:53.53 1 java
21029 root 34 19 0 0 0 S 1 0.0 485:31.55 5 kipmi0
4221 site029 23 0 298m 188m 31m S 1 0.8 122:45.26 3 java
14627 site046 17 0 282m 159m 30m S 1 0.7 77:49.32 1 java
23496 site081 18 0 276m 166m 31m S 1 0.7 84:01.37 5 java

Then look at the following output from 'ps' taken at exactly the same time, which shows the same process at 0.3% CPU...

[root@app03 site145]# ps aux|grep 2986
site024 2986 5.0 0.3 263140 95204 ? Sl 06:27 26:22 /usr/java/j2sdk1.4.2_09/bin/java

I repeated this test over and over and got similar results each time.

What gives?

--Eric

syg00 · 04-23-2010, 10:09 PM

Both use sampled data - there is no possibility of "...at exactly the same time". Especially where 8 (apparent) processors are involved.
The "j" option includes "last used CPU" - on a SMP system there is no presumption that is the only CPU a process has been dispatched on in the interval. The %CPU (in top) is for the process, not the (last used) CPU - and it's not normalised.
As for the usage discrepancy, you're comparing apples to oranges - see the respective manpages for what the numbers are actually representing in each case.

forbin · 04-24-2010, 01:18 AM

Hi syg00, your reply is not unexpected. I've been around the block a few years (25) so I understand that there is no possibility of "exactly the same time," but this situation is still way too weird. 'top' showed the above results continuously for at least an hour. In 'top,' pid 2986 never went below 99% CPU and never showed another 'last CPU' except 7, while at the same time it showed CPU 7 utilization rarely above 20%. Both of these numbers are in 'top,' so there is no apples-to-oranges problem there. One would certainly not expect top to disagree with itself. Thoughts?

(Thanks for the trigger to check the 'ps' man page, though. I see that in 'ps,' CPU% is expressed as time spent running over the lifetime of the process, which is indeed an apples-to-oranges issue.)

syg00 · 04-24-2010, 08:24 AM

A quick trace of top over one interval shows it scanning several /proc files for each process - twice. Presumably to determine usage over the interval - 3 seconds by default.
The summary area numbers appear to be obtained from /proc/stat resolved over /proc/uptime.

So a similar discrepancy arises - it looks like the process numbers are over the interval (as expected), but the summary data (the CPU data) is average since boot.
Supposition only on my behalf, I haven't looked at the code.

forbin · 04-24-2010, 01:24 PM

I don't see how either the top part of 'top' (which I think you referred to as the summary data) or the bottom part of 'top' could be a representation of average usage since boot. The CPU usage stats in the top part cleary change radically every few seconds, and so do the process numbers on the bottom. If either of those were a representation of an average since boot, then on a server that has been up for a few months the numbers would not be observed to change hardly at all. I am seeing the numbers on both top abd bottom change radically every few seconds, except for pid 2986, which at this writing is STILL pegged at 100% on CPU 7, while CPU 7 is at 97% idle.

johnsfine · 04-28-2010, 09:05 AM

The detail that looks wrong is which core that active process is running on. IIUC, you have one single threaded process continuously using 100% of a core and nothing much else happening. Linux is frequently moving that one process to a different core, so no specific core is getting near 100% use. But top is always reporting that process as being on core 7.

I don't know enough about top to be sure of any of that, nor to have any clue why it happens.

forbin · 04-28-2010, 10:56 AM

The kernel is "moving" the process to a different core? Why would it do that?

johnsfine · 04-28-2010, 11:21 AM

I don't know why. I have just observed that behavior whenever I run a single CPU bound thread on a lightly loaded system. Windows (at least XP) moves the thread to a different core more often and distributes the load across cores more uniformly. But both Windows and Linux move the thread.

Maybe it protects a multi core processor from thermal stress from have one core hot while the others are cold. I'm just guessing. I really don't know.

forbin · 04-29-2010, 08:35 AM

Okay, I'll reserve judgement, because I've done an enormous amount of monitoring on Windows and somewhat less on Linux and I don't recall ever seeing that behavior. As far as I know, when a thread requests CPU time, the Windows kernel thread dispatcher identifies a free CPU and dispatches the thread to it. I've never read about it deciding to pull the thread off of the CPU and dispatch it to another one. I guess that behavior could be beneficial in terms of distributing the CPU heat, but otherwise I can't think of a good reason to do that. FYI, the system is not very lightly loaded. There are 140 java processes consuming 23GB of RAM serving a few hundred clients. It's just that at the moment the snapshot above was taken, not much else was happening.