[SOLVED] top command version 3.2.6 invalid results compared to collectD
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
top command version 3.2.6 invalid results compared to collectD
ALL,
I am trying to get the correct CPU usage using top. I ran collectD and I got different results. This especially happens when the CPU usage is close to 100%.
Has anyone experienced having the top command report different results than other CPU usage utilities?
All the tools I know use sampling - by definition this is only as accurate as the data collected.
Also helps if you measure the same thing. Have a look at the collectD FAQ for a brief discussion.
Last edited by syg00; 03-11-2011 at 07:53 AM.
Reason: removed duplicate post
I agree that the data sampling must be good. Both collectD and top use the /proc/stat file to calculate its results.
The test between top and collectD were running the same data on the same system. But the results were different. The frequency of every ten seconds was the same too.
The only thing I can think of is that collectD has a daemon collect the data from /proc/stat and sends it to another client program it has. Now top is just one process and it does all of the calculations and display by itself. So it is doing much more work on the fly.
I feel that top has its limitations. It looks like top has a few bugs in it.
The only other difference I could think of is priorities on top vs other daemons or tasks in the system.
How were you running top - with "-d 10", or "-n 1" in a timed loop ?. Makes a big difference; the manpage warns about using a single invocation.
I looked into this a while back too for similar reasons I suspect. As it happens, I have an old strace of top laying around. It probes /proc/stat at the beginning, processes all the pids (regardless of whether you limit it to specific pid/user), then probes /proc/stat again, presumably to work out the summary numbers (I haven't looked at the code).
This time lag skews all the numbers somewhat.
You call it a bug, maybe it's just working as designed. All code has design decisions, you just have to somehow figure out what they were and how that might affect you.
I haven't tried collectD - thanks for that, I'll give it a try-out. You might also want to have a look at collectl.
syg00 - interesting comment about the way top calculates the CPU time - in other words it's actually measuring its own contribution to the load, something I guess I hadn't thought too much about with collectl, but when doing multiple things there's no getting around it. If you report CPU and process metrics you'd have the same result. On the other hand if you ONLY measure CPU with collectl that's all you'll get as it reads /proc/stats and that's all. Reading though all the /proc/pid structures to get the top processes IS very heavyweight and a reason collectl defaults to only reporting these stats once a minute.
The other thing I'd comment on is if you really want to see what the CPU is doing, I'd run collectl with a sampling interval of 1 second or even 0.1 seconds. The system will barely notice the load. And to get even more data use --verbose. In other words:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.