/proc/stat CPU utilization seems incorrect
Hi all,
I'm using /proc/stat to accurately determine CPU utilization during an Ethernet benchmark. The benchmark looks roughly as follow: Code:
cat /proc/stat | grep "cpu" >> $cpu_log My system contains 8 CPUs (only 4 are used, taskset maps the benchmarks to CPU4, CPU5, CPU6, and CPU7) and is running Linux kernel 3.2.8. To analyze the CPU utilization of the different CPUs in different modes I analyzed the /proc/stat information. The raw log looks for example like this : Code:
cpu 10384 0 136128 14417335 15115 6 21414 0 0 0 Code:
user nice system idle iowait irq softirq sum One note to this is that CPU5 receives all the interrupts from the benchmarked Ethernet connection (this is also visible from the softirq number above). When mapping these interrupts to another processor the other processor has the same behavior. Thus I suspect this issue is related to handling the interrupts. I looked into the kernel code handling the softirq time but this takes the interrupts into account (as it should). Another note is that this "lazy cpu syndrome" is not visible for all my benchmarks. Does anyone know how the kernel scheduler handles this timing exactly and what could go wrong? Has anyone ever seen similar results? |
When a particular unit of work is ready to be processed and a particular CPU happens to be the one to notice this, then, all other things being equal, "that particular CPU will just happen to win the toss." It doesn't particularly mean anything at all if the distribution of CPU time among various candidates is unequal, assuming that all of the CPUs have equal physical capabilities and access to physical resources (e.g. they are actually cores).
If you want to measure things, don't measure what CPUs are doing: measure how long processes are waiting and for what reasons. |
Thanks for your reply sundialsvcs.
I think I do not understand your answer, or you did not understand my question. You say that: Quote:
My question is not about idle or processing time, it is about the total time spent by the processor; including idle, user space, kernel space, interrupts... Could you please elaborate on your answer or ask specific additional information if my question was not clear. |
I also discussed this issue on the linux kernel mailing list in this thread.
One proposed solution is revoking patch: git commit a25cac5198d4ff28 "proc: Consider NO_HZ when printing idle and iowait times" This did reduce the differences, however now the IRQ handling cpu is doing more time than the other cpus: Code:
user nice system idle iowait irq softirq sum This configuration also reduced the sum-differences and seems to log more irqs: Code:
user nice system idle iowait irq softirq sum |
With all due respect to all here at LQ, if the good folks at lkml can't give you a lead, I think you're on your own.
I saw this when you first posted it, but have no answer for you. |
All times are GMT -5. The time now is 07:00 PM. |