'ps' command - true meaning of 'pcpu' & 'time' values in a multi-CPU environment

MarkTurbo · 02-01-2008, 04:55 AM

I'm trying to calculate the real '%cpu' values for a number of processes for which I have 'ps -auwx' output. I've taken snapshots at the start and end of the period I want to measure. I know the 'pcpu' values reported are useless, since they are cumulative from the initial start of the process and are mostly showing '99.9%' in my case.

After searching the net, I believe I can calculate the values I need from the 'CPU TIME' values. Correct me if I'm wrong but if I know the length of time my processes have been running (the sample period), their actual '%cpu' values can be calculated as follows, right?:

'CPU TIME' accumulated during the sample period
----------------------------------------------- x 100
length of sample period

The problem is that my measured 'CPU TIME' during the sample period is much larger than the sample period itself, which is giving me a calculated '%cpu' value of about 350%!

My machine has 8 CPU Cores, so my question is, is the 'CPU TIME' reported by 'ps' actually the total accumulated value for 8 CPUs? If I take the 'CPU TIME' and divide it by '8' first (which presumably gives me the average CPU time), I get a result more like what I'm expecting, i.e. 44%

By the way, I'm running RHEL4, U2, 32-bit on an x86 platform.

Mark

syg00 · 02-01-2008, 10:24 PM

This sounds horribly rubbery.
- is this multi-threaded ???.
- same number of threads for the entire period ...
- all of them always runnable (during your period) ...

Think about it this way - if you have 8 (or 12, or ...) threads, but only 3 are runnable (and hence accumulating CPU time), how does that affect the numbers ???.
Your maths sounds potentially dodgy to me.

MarkTurbo · 02-02-2008, 12:20 AM

Quote:

Originally Posted by syg00

This sounds horribly rubbery.
- is this multi-threaded ???.
- same number of threads for the entire period ...
- all of them always runnable (during your period) ...

Think about it this way - if you have 8 (or 12, or ...) threads, but only 3 are runnable (and hence accumulating CPU time), how does that affect the numbers ???.
Your maths sounds potentially dodgy to me.

I managed to speak with one of my colleagues at work about this as well. I think 'threads' is the key to explaining the figures.

Yes, the process I'm monitoring is massively multi-threaded. It actually maintains about 1,115 threads throughout the sample period.

I now see that the ideal solution would be to capture data for all the threads of each process, but it's not practical to do so in the environments in which I work. I run benchmarks on applications that are always massively multi-threaded, where sessions are created and destroyed regularly throughout the sample periods. Sessions generally multiplex across threads.

I think I can make sense of the data if I make some broad assumptions. I'll start by assuming that my system is balanced and that all the process threads are evenly distributed across all the CPU Cores. This way, in my example, the 'CPU TIME' can be regarded as the total thread processing time spread evenly across all CPU Cores. I have 8 Cores, so dividing by 8 gives me the average 'CPU TIME' of the process over the sample period.

It's not ideal, but I'm open to better suggestions. My data collection sample period is 2 hours, so there's a decent chance that the system settled into a balanced run over that period. Unfortunately, I can't go back and re-run my benchmark to capture new data. I have to work with the data I've already captured.

Mark

syg00 · 02-02-2008, 02:15 AM

No, that seems fine.
If you can be confident that you have (at least) 8 threads in the runqueue you'll be fine.
I was presuming a number somewhat less than what you seem to be achieving.

MarkTurbo · 02-02-2008, 04:30 AM

Quote:

Originally Posted by syg00

No, that seems fine.
If you can be confident that you have (at least) 8 threads in the runqueue you'll be fine.
I was presuming a number somewhat less than what you seem to be achieving.

Thanks. I appreciate your opinion. I haven't been working with Linux systems for as long as I've been working with AIX. I'm discovering that it can be painful interpreting Linux stats, especially in this area of CPU usage and memory usage. The same tools on AIX work differently and actually yield reliable stats without further manipulation.

Mark

syg00 · 02-02-2008, 04:48 AM

Quote:

Originally Posted by MarkTurbo

I'm discovering that it can be painful interpreting Linux stats, especially in this area of CPU usage and memory usage.

The performance/tuning metrics in Linux are abysmal.
Simple as that. Work with what you have - it's all you can do.

What I'd give for RMF ...