Hi all,
I'm using /proc/stat to accurately determine CPU utilization during an Ethernet benchmark. The benchmark looks roughly as follow:
Code:
cat /proc/stat | grep "cpu" >> $cpu_log
taskset 10 chrt -f 99 $execute
taskset 20 chrt -f 99 $execute
taskset 40 chrt -f 99 $execute
taskset 80 chrt -f 99 $execute
cat /proc/stat | grep "cpu" >> $cpu_log
The benchmark executed tests Ethernet throughput to a remote endpoint.
My system contains 8 CPUs (only 4 are used, taskset maps the benchmarks to CPU4, CPU5, CPU6, and CPU7) and is running Linux kernel 3.2.8.
To analyze the CPU utilization of the different CPUs in different modes I analyzed the /proc/stat information. The raw log looks for example like this :
Code:
cpu 10384 0 136128 14417335 15115 6 21414 0 0 0
cpu0 371 0 1136 1808846 14606 3 398 0 0 0
cpu1 238 0 451 1827606 126 0 50 0 0 0
cpu2 89 0 345 1828021 39 0 15 0 0 0
cpu3 314 0 524 1827629 26 0 10 0 0 0
cpu4 268 0 3008 1804530 106 2 1827 0 0 0
cpu5 2320 0 33947 1767454 99 0 18941 0 0 0
cpu6 3429 0 48610 1776287 70 0 76 0 0 0
cpu7 3352 0 48105 1776957 41 0 95 0 0 0
cpu 10389 0 136562 14425011 15115 6 21467 0 0 0
cpu0 371 0 1136 1809913 14606 3 398 0 0 0
cpu1 238 0 452 1828676 126 0 50 0 0 0
cpu2 89 0 345 1829092 39 0 15 0 0 0
cpu3 315 0 524 1828699 26 0 10 0 0 0
cpu4 269 0 3102 1805504 106 2 1827 0 0 0
cpu5 2322 0 34039 1767989 99 0 18993 0 0 0
cpu6 3429 0 48692 1777274 70 0 76 0 0 0
cpu7 3353 0 48270 1777862 41 0 95 0 0 0
To see the time spent in each mode for each CPU I take the differences and also add a SUM column, the formatted log looks like:
Code:
user nice system idle iowait irq softirq sum
cpu 5 0 434 7676 0 0 53 8168
cpu0 0 0 0 1067 0 0 0 1067
cpu1 0 0 1 1070 0 0 0 1071
cpu2 0 0 0 1071 0 0 0 1071
cpu3 1 0 0 1070 0 0 0 1071
cpu4 1 0 94 974 0 0 0 1069
cpu5 2 0 92 535 0 0 52 681
cpu6 0 0 82 987 0 0 0 1069
cpu7 1 0 165 905 0 0 0 1071
Now my problem is clearly visible from the sum column. All CPUs spent around 10.7 seconds (which sounds reasonably since my benchmark takes around 10 seconds), however CPU4 has only spent 6.81 seconds.
How is this possible?
One note to this is that CPU5 receives all the interrupts from the benchmarked Ethernet connection (this is also visible from the softirq number above). When mapping these interrupts to another processor the other processor has the same behavior.
Thus I suspect this issue is related to handling the interrupts. I looked into the kernel code handling the softirq time but this takes the interrupts into account (as it should).
Another note is that this "lazy cpu syndrome" is not visible for all my benchmarks.
Does anyone know how the kernel scheduler handles this timing exactly and what could go wrong?
Has anyone ever seen similar results?