'ps' command - true meaning of 'pcpu' & 'time' values in a multi-CPU environment
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
'ps' command - true meaning of 'pcpu' & 'time' values in a multi-CPU environment
I'm trying to calculate the real '%cpu' values for a number of processes for which I have 'ps -auwx' output. I've taken snapshots at the start and end of the period I want to measure. I know the 'pcpu' values reported are useless, since they are cumulative from the initial start of the process and are mostly showing '99.9%' in my case.
After searching the net, I believe I can calculate the values I need from the 'CPU TIME' values. Correct me if I'm wrong but if I know the length of time my processes have been running (the sample period), their actual '%cpu' values can be calculated as follows, right?:
'CPU TIME' accumulated during the sample period
----------------------------------------------- x 100
length of sample period
The problem is that my measured 'CPU TIME' during the sample period is much larger than the sample period itself, which is giving me a calculated '%cpu' value of about 350%!
My machine has 8 CPU Cores, so my question is, is the 'CPU TIME' reported by 'ps' actually the total accumulated value for 8 CPUs? If I take the 'CPU TIME' and divide it by '8' first (which presumably gives me the average CPU time), I get a result more like what I'm expecting, i.e. 44%
By the way, I'm running RHEL4, U2, 32-bit on an x86 platform.
This sounds horribly rubbery.
- is this multi-threaded ???.
- same number of threads for the entire period ...
- all of them always runnable (during your period) ...
Think about it this way - if you have 8 (or 12, or ...) threads, but only 3 are runnable (and hence accumulating CPU time), how does that affect the numbers ???.
Your maths sounds potentially dodgy to me.
This sounds horribly rubbery.
- is this multi-threaded ???.
- same number of threads for the entire period ...
- all of them always runnable (during your period) ...
Think about it this way - if you have 8 (or 12, or ...) threads, but only 3 are runnable (and hence accumulating CPU time), how does that affect the numbers ???.
Your maths sounds potentially dodgy to me.
I managed to speak with one of my colleagues at work about this as well. I think 'threads' is the key to explaining the figures.
Yes, the process I'm monitoring is massively multi-threaded. It actually maintains about 1,115 threads throughout the sample period.
I now see that the ideal solution would be to capture data for all the threads of each process, but it's not practical to do so in the environments in which I work. I run benchmarks on applications that are always massively multi-threaded, where sessions are created and destroyed regularly throughout the sample periods. Sessions generally multiplex across threads.
I think I can make sense of the data if I make some broad assumptions. I'll start by assuming that my system is balanced and that all the process threads are evenly distributed across all the CPU Cores. This way, in my example, the 'CPU TIME' can be regarded as the total thread processing time spread evenly across all CPU Cores. I have 8 Cores, so dividing by 8 gives me the average 'CPU TIME' of the process over the sample period.
It's not ideal, but I'm open to better suggestions. My data collection sample period is 2 hours, so there's a decent chance that the system settled into a balanced run over that period. Unfortunately, I can't go back and re-run my benchmark to capture new data. I have to work with the data I've already captured.
No, that seems fine.
If you can be confident that you have (at least) 8 threads in the runqueue you'll be fine.
I was presuming a number somewhat less than what you seem to be achieving.
No, that seems fine.
If you can be confident that you have (at least) 8 threads in the runqueue you'll be fine.
I was presuming a number somewhat less than what you seem to be achieving.
Thanks. I appreciate your opinion. I haven't been working with Linux systems for as long as I've been working with AIX. I'm discovering that it can be painful interpreting Linux stats, especially in this area of CPU usage and memory usage. The same tools on AIX work differently and actually yield reliable stats without further manipulation.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.