LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices


Reply
  Search this Thread
Old 04-23-2010, 06:29 PM   #1
forbin
Member
 
Registered: Apr 2010
Posts: 43

Rep: Reputation: 0
Contradictory Output In 'top' And 'ps'


Look at proccessor #7 in the follwing output from 'top'. It shows 78% idle. Fine, but then look at the first process in the list, which shows it at 100% CPU running on processor #7.
top - 15:16:59 up 55 days, 11:25, 4 users, load average: 1.43, 1.56, 1.52
Tasks: 361 total, 1 running, 358 sleeping, 0 stopped, 2 zombie
Cpu0 : 11.6%us, 2.3%sy, 0.0%ni, 84.1%id, 0.0%wa, 0.3%hi, 1.7%si, 0.0%st
Cpu1 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 59.3%us, 0.3%sy, 0.0%ni, 40.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 2.0%us, 0.3%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.7%us, 1.7%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 18.5%us, 0.7%sy, 0.0%ni, 80.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 21.3%us, 0.3%sy, 0.0%ni, 78.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24942760k total, 23152296k used, 1790464k free, 317332k buffers
Swap: 2031608k total, 300k used, 2031308k free, 3572272k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
2986 site024 17 0 256m 92m 31m S 100 0.4 38:13.87 7 java
5897 site014 18 0 254m 143m 30m S 2 0.6 10:28.13 0 java
9793 site039 18 0 287m 180m 33m S 1 0.7 131:15.04 5 java
17426 site057 25 0 454m 336m 31m S 1 1.4 10:48.94 6 java
19514 site040 17 0 270m 135m 31m S 1 0.6 6:53.53 1 java
21029 root 34 19 0 0 0 S 1 0.0 485:31.55 5 kipmi0
4221 site029 23 0 298m 188m 31m S 1 0.8 122:45.26 3 java
14627 site046 17 0 282m 159m 30m S 1 0.7 77:49.32 1 java
23496 site081 18 0 276m 166m 31m S 1 0.7 84:01.37 5 java

Then look at the following output from 'ps' taken at exactly the same time, which shows the same process at 0.3% CPU...
[root@app03 site145]# ps aux|grep 2986
site024 2986 5.0 0.3 263140 95204 ? Sl 06:27 26:22 /usr/java/j2sdk1.4.2_09/bin/java
I repeated this test over and over and got similar results each time.

What gives?

--Eric
 
Old 04-23-2010, 11:09 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,253

Rep: Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395
Both use sampled data - there is no possibility of "...at exactly the same time". Especially where 8 (apparent) processors are involved.
The "j" option includes "last used CPU" - on a SMP system there is no presumption that is the only CPU a process has been dispatched on in the interval. The %CPU (in top) is for the process, not the (last used) CPU - and it's not normalised.
As for the usage discrepancy, you're comparing apples to oranges - see the respective manpages for what the numbers are actually representing in each case.
 
Old 04-24-2010, 02:18 AM   #3
forbin
Member
 
Registered: Apr 2010
Posts: 43

Original Poster
Rep: Reputation: 0
Hi syg00, your reply is not unexpected. I've been around the block a few years (25) so I understand that there is no possibility of "exactly the same time," but this situation is still way too weird. 'top' showed the above results continuously for at least an hour. In 'top,' pid 2986 never went below 99% CPU and never showed another 'last CPU' except 7, while at the same time it showed CPU 7 utilization rarely above 20%. Both of these numbers are in 'top,' so there is no apples-to-oranges problem there. One would certainly not expect top to disagree with itself. Thoughts?

(Thanks for the trigger to check the 'ps' man page, though. I see that in 'ps,' CPU% is expressed as time spent running over the lifetime of the process, which is indeed an apples-to-oranges issue.)

Last edited by forbin; 04-24-2010 at 02:33 AM.
 
Old 04-24-2010, 09:24 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,253

Rep: Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395Reputation: 3395
A quick trace of top over one interval shows it scanning several /proc files for each process - twice. Presumably to determine usage over the interval - 3 seconds by default.
The summary area numbers appear to be obtained from /proc/stat resolved over /proc/uptime.

So a similar discrepancy arises - it looks like the process numbers are over the interval (as expected), but the summary data (the CPU data) is average since boot.
Supposition only on my behalf, I haven't looked at the code.
 
Old 04-24-2010, 02:24 PM   #5
forbin
Member
 
Registered: Apr 2010
Posts: 43

Original Poster
Rep: Reputation: 0
I don't see how either the top part of 'top' (which I think you referred to as the summary data) or the bottom part of 'top' could be a representation of average usage since boot. The CPU usage stats in the top part cleary change radically every few seconds, and so do the process numbers on the bottom. If either of those were a representation of an average since boot, then on a server that has been up for a few months the numbers would not be observed to change hardly at all. I am seeing the numbers on both top abd bottom change radically every few seconds, except for pid 2986, which at this writing is STILL pegged at 100% on CPU 7, while CPU 7 is at 97% idle.
 
Old 04-28-2010, 10:05 AM   #6
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194
The detail that looks wrong is which core that active process is running on. IIUC, you have one single threaded process continuously using 100% of a core and nothing much else happening. Linux is frequently moving that one process to a different core, so no specific core is getting near 100% use. But top is always reporting that process as being on core 7.

I don't know enough about top to be sure of any of that, nor to have any clue why it happens.
 
Old 04-28-2010, 11:56 AM   #7
forbin
Member
 
Registered: Apr 2010
Posts: 43

Original Poster
Rep: Reputation: 0
The kernel is "moving" the process to a different core? Why would it do that?
 
Old 04-28-2010, 12:21 PM   #8
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194Reputation: 1194
I don't know why. I have just observed that behavior whenever I run a single CPU bound thread on a lightly loaded system. Windows (at least XP) moves the thread to a different core more often and distributes the load across cores more uniformly. But both Windows and Linux move the thread.

Maybe it protects a multi core processor from thermal stress from have one core hot while the others are cold. I'm just guessing. I really don't know.

Last edited by johnsfine; 04-28-2010 at 12:22 PM.
 
Old 04-29-2010, 09:35 AM   #9
forbin
Member
 
Registered: Apr 2010
Posts: 43

Original Poster
Rep: Reputation: 0
Okay, I'll reserve judgement, because I've done an enormous amount of monitoring on Windows and somewhat less on Linux and I don't recall ever seeing that behavior. As far as I know, when a thread requests CPU time, the Windows kernel thread dispatcher identifies a free CPU and dispatches the thread to it. I've never read about it deciding to pull the thread off of the CPU and dispatch it to another one. I guess that behavior could be beneficial in terms of distributing the CPU heat, but otherwise I can't think of a good reason to do that. FYI, the system is not very lightly loaded. There are 140 java processes consuming 23GB of RAM serving a few hundred clients. It's just that at the moment the snapshot above was taken, not much else was happening.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
About the top command output jacobselvin Linux - General 1 07-17-2009 01:16 AM
Help me understand my top output neocontrol Linux - Server 5 02-21-2008 06:46 PM
more pretty output for 'ps' zymos Linux - Software 1 10-29-2007 03:41 PM
Output of 'top' as backgrond Burgin Linux - General 2 06-09-2005 06:18 PM
Odd output of 'top' philforrest Linux - General 1 07-20-2004 01:05 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat

All times are GMT -5. The time now is 08:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration