LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 04-03-2012, 06:57 AM   #1
bassvdo
LQ Newbie
 
Registered: Apr 2012
Posts: 3

Rep: Reputation: Disabled
/proc/stat CPU utilization seems incorrect


Hi all,

I'm using /proc/stat to accurately determine CPU utilization during an Ethernet benchmark. The benchmark looks roughly as follow:
Code:
cat /proc/stat | grep "cpu" >> $cpu_log
taskset 10 chrt -f 99 $execute
taskset 20 chrt -f 99 $execute
taskset 40 chrt -f 99 $execute
taskset 80 chrt -f 99 $execute
cat /proc/stat | grep "cpu" >> $cpu_log
The benchmark executed tests Ethernet throughput to a remote endpoint.
My system contains 8 CPUs (only 4 are used, taskset maps the benchmarks to CPU4, CPU5, CPU6, and CPU7) and is running Linux kernel 3.2.8.
To analyze the CPU utilization of the different CPUs in different modes I analyzed the /proc/stat information. The raw log looks for example like this :
Code:
cpu	10384	0	136128	14417335	15115	6	21414	0	0	0
cpu0	371	0	1136	1808846	14606	3	398	0	0	0
cpu1	238	0	451	1827606	126	0	50	0	0	0
cpu2	89	0	345	1828021	39	0	15	0	0	0
cpu3	314	0	524	1827629	26	0	10	0	0	0
cpu4	268	0	3008	1804530	106	2	1827	0	0	0
cpu5	2320	0	33947	1767454	99	0	18941	0	0	0
cpu6	3429	0	48610	1776287	70	0	76	0	0	0
cpu7	3352	0	48105	1776957	41	0	95	0	0	0
cpu	10389	0	136562	14425011	15115	6	21467	0	0	0
cpu0	371	0	1136	1809913	14606	3	398	0	0	0
cpu1	238	0	452	1828676	126	0	50	0	0	0
cpu2	89	0	345	1829092	39	0	15	0	0	0
cpu3	315	0	524	1828699	26	0	10	0	0	0
cpu4	269	0	3102	1805504	106	2	1827	0	0	0
cpu5	2322	0	34039	1767989	99	0	18993	0	0	0
cpu6	3429	0	48692	1777274	70	0	76	0	0	0
cpu7	3353	0	48270	1777862	41	0	95	0	0	0
To see the time spent in each mode for each CPU I take the differences and also add a SUM column, the formatted log looks like:
Code:
	user	nice	system	idle	iowait	irq	softirq	sum
cpu	5	0	434	7676	0	0	53	8168
cpu0	0	0	0	1067	0	0	0	1067
cpu1	0	0	1	1070	0	0	0	1071
cpu2	0	0	0	1071	0	0	0	1071
cpu3	1	0	0	1070	0	0	0	1071
cpu4	1	0	94	974	0	0	0	1069
cpu5	2	0	92	535	0	0	52	681
cpu6	0	0	82	987	0	0	0	1069
cpu7	1	0	165	905	0	0	0	1071
Now my problem is clearly visible from the sum column. All CPUs spent around 10.7 seconds (which sounds reasonably since my benchmark takes around 10 seconds), however CPU4 has only spent 6.81 seconds. How is this possible?

One note to this is that CPU5 receives all the interrupts from the benchmarked Ethernet connection (this is also visible from the softirq number above). When mapping these interrupts to another processor the other processor has the same behavior.
Thus I suspect this issue is related to handling the interrupts. I looked into the kernel code handling the softirq time but this takes the interrupts into account (as it should).

Another note is that this "lazy cpu syndrome" is not visible for all my benchmarks.

Does anyone know how the kernel scheduler handles this timing exactly and what could go wrong?
Has anyone ever seen similar results?

Last edited by bassvdo; 04-03-2012 at 09:27 AM. Reason: Further clarification
 
Old 04-05-2012, 02:08 PM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,691
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
When a particular unit of work is ready to be processed and a particular CPU happens to be the one to notice this, then, all other things being equal, "that particular CPU will just happen to win the toss." It doesn't particularly mean anything at all if the distribution of CPU time among various candidates is unequal, assuming that all of the CPUs have equal physical capabilities and access to physical resources (e.g. they are actually cores).

If you want to measure things, don't measure what CPUs are doing: measure how long processes are waiting and for what reasons.
 
Old 04-10-2012, 03:06 AM   #3
bassvdo
LQ Newbie
 
Registered: Apr 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks for your reply sundialsvcs.

I think I do not understand your answer, or you did not understand my question.
You say that:
Quote:
the distribution of CPU time among various candidates is unequal
How is this possible? Each CPU should have the same notion of time, at least very close, right?
My question is not about idle or processing time, it is about the total time spent by the processor; including idle, user space, kernel space, interrupts...

Could you please elaborate on your answer or ask specific additional information if my question was not clear.
 
Old 05-09-2012, 04:10 AM   #4
bassvdo
LQ Newbie
 
Registered: Apr 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Question

I also discussed this issue on the linux kernel mailing list in this thread.

One proposed solution is revoking patch: git commit a25cac5198d4ff28 "proc: Consider NO_HZ when printing idle and iowait times"
This did reduce the differences, however now the IRQ handling cpu is doing more time than the other cpus:
Code:
       user    nice    system  idle    iowait  irq   softirq sum
cpu     4       0       412     8186    0       0       79      8681
cpu0    0       0       1       1074    0       0       1       1076
cpu1    0       0       2       1075    0       0       0       1077
cpu2    0       0       1       1075    0       0       0       1076
cpu3    0       0       1       1079    0       0       0       1080
cpu4    1       0       115     941     0       0       78      1135
cpu5    2       0       99      984     0       0       0       1085
cpu6    0       0       98      977     0       0       0       1075
cpu7    0       0       95      980     0       0       0       1075
Another proposed solution is the CONFIG_IRQ_TIME_ACCOUNTING kernel configuration.
This configuration also reduced the sum-differences and seems to log more irqs:
Code:
       user    nice    system  idle    iowait  irq     softirq sum
cpu     5       0       373     7610    0       55      382     8425
cpu0    0       0       1       1042    0       18      1       1062
cpu1    0       0       1       1058    0       1       0       1060
cpu2    0       0       1       1060    0       0       0       1061
cpu3    0       0       0       1061    0       1       0       1062
cpu4    2       0       75      505     0       36      380     998
cpu5    1       0       82      983     0       0       0       1066
cpu6    1       0       139     911     0       0       0       1051
cpu7    1       0       74      988     0       0       0       1063
Both solutions still seem to have problems with recording the "exact" utilization, any other suggestions are welcome.
 
Old 05-09-2012, 04:20 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
With all due respect to all here at LQ, if the good folks at lkml can't give you a lead, I think you're on your own.
I saw this when you first posted it, but have no answer for you.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
top utility: initial display shows incorrect CPU utilization data texas_danrob Linux - Software 3 09-01-2009 08:34 AM
CPU usage from /proc/stat Rohan Nakar Linux - Newbie 3 01-25-2007 12:49 AM
New cpu values in /proc/stat? eldonz Linux - General 1 10-19-2005 12:16 PM
/proc/stat cpu info GDarkBladE Linux - Software 2 10-11-2005 03:16 PM
/proc/stat cpu ahz Programming 1 04-24-2005 07:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 01:20 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration