Red HatThis forum is for the discussion of Red Hat Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Look at proccessor #7 in the follwing output from 'top'. It shows 78% idle. Fine, but then look at the first process in the list, which shows it at 100% CPU running on processor #7.
Both use sampled data - there is no possibility of "...at exactly the same time". Especially where 8 (apparent) processors are involved.
The "j" option includes "last used CPU" - on a SMP system there is no presumption that is the only CPU a process has been dispatched on in the interval. The %CPU (in top) is for the process, not the (last used) CPU - and it's not normalised.
As for the usage discrepancy, you're comparing apples to oranges - see the respective manpages for what the numbers are actually representing in each case.
Hi syg00, your reply is not unexpected. I've been around the block a few years (25) so I understand that there is no possibility of "exactly the same time," but this situation is still way too weird. 'top' showed the above results continuously for at least an hour. In 'top,' pid 2986 never went below 99% CPU and never showed another 'last CPU' except 7, while at the same time it showed CPU 7 utilization rarely above 20%. Both of these numbers are in 'top,' so there is no apples-to-oranges problem there. One would certainly not expect top to disagree with itself. Thoughts?
(Thanks for the trigger to check the 'ps' man page, though. I see that in 'ps,' CPU% is expressed as time spent running over the lifetime of the process, which is indeed an apples-to-oranges issue.)
A quick trace of top over one interval shows it scanning several /proc files for each process - twice. Presumably to determine usage over the interval - 3 seconds by default.
The summary area numbers appear to be obtained from /proc/stat resolved over /proc/uptime.
So a similar discrepancy arises - it looks like the process numbers are over the interval (as expected), but the summary data (the CPU data) is average since boot.
Supposition only on my behalf, I haven't looked at the code.
I don't see how either the top part of 'top' (which I think you referred to as the summary data) or the bottom part of 'top' could be a representation of average usage since boot. The CPU usage stats in the top part cleary change radically every few seconds, and so do the process numbers on the bottom. If either of those were a representation of an average since boot, then on a server that has been up for a few months the numbers would not be observed to change hardly at all. I am seeing the numbers on both top abd bottom change radically every few seconds, except for pid 2986, which at this writing is STILL pegged at 100% on CPU 7, while CPU 7 is at 97% idle.
The detail that looks wrong is which core that active process is running on. IIUC, you have one single threaded process continuously using 100% of a core and nothing much else happening. Linux is frequently moving that one process to a different core, so no specific core is getting near 100% use. But top is always reporting that process as being on core 7.
I don't know enough about top to be sure of any of that, nor to have any clue why it happens.
I don't know why. I have just observed that behavior whenever I run a single CPU bound thread on a lightly loaded system. Windows (at least XP) moves the thread to a different core more often and distributes the load across cores more uniformly. But both Windows and Linux move the thread.
Maybe it protects a multi core processor from thermal stress from have one core hot while the others are cold. I'm just guessing. I really don't know.
Okay, I'll reserve judgement, because I've done an enormous amount of monitoring on Windows and somewhat less on Linux and I don't recall ever seeing that behavior. As far as I know, when a thread requests CPU time, the Windows kernel thread dispatcher identifies a free CPU and dispatches the thread to it. I've never read about it deciding to pull the thread off of the CPU and dispatch it to another one. I guess that behavior could be beneficial in terms of distributing the CPU heat, but otherwise I can't think of a good reason to do that. FYI, the system is not very lightly loaded. There are 140 java processes consuming 23GB of RAM serving a few hundred clients. It's just that at the moment the snapshot above was taken, not much else was happening.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.