LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-10-2016, 12:13 PM   #1
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 993

Rep: Reputation: 126Reputation: 126
High CPU usage by kworker


Hi!
Since I have installed Linux (Gentoo) on a new Asus G771J (some months ago), I've been observing that very often there is something that uses every few seconds 100 of 1 vCPU core.

All my efforts to find a pattern failed.
Sometimes everything is fine for quite a long time (e.g. 10 or 20 minutes) but then all of a sudden I see on some random core 100 system CPU usage (so like something that issues system calls) for 1-2 seconds, then nothing for a few seconds, then again 100% cpu usage, then nothing for some more seconds, then again, and so on... .

When this happens and I look at which process is using the CPU I always see that it's some "kworker" process, which I suppose is the kernel.
For example right now the process that is making me nervous is:
Code:
# ps -Af | grep -i 20989 | grep -v grep
root     20989     2 11 18:18 ?        00:05:05 kworker/u16:1
Apparently it happens even if I've got the desktop idle with nothing running (excluding wireless + X + the enlightenment WM + xmodmap) and nothing being written/read to/from HDD and almost no wifi activity (just a few bytes now and then).
All my other notebooks and PCs run the same OS & distro & SW and config and don't have this problem.


Any clue how I could manage to find out what's going on?
Is it possible to check to what these "kworker"-threads are linked / are doing?


Many thanks

EDIT:
upgraded kernel from 3.17.8 to 4.1.12, which did not change anything.

Last edited by Pearlseattle; 01-10-2016 at 12:14 PM.
 
Old 01-10-2016, 06:13 PM   #2
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 329Reputation: 329Reputation: 329Reputation: 329
Quote:
Any clue how I could manage to find out what's going on?
Maybe use perf. A plethora of info here.
 
1 members found this post helpful.
Old 01-10-2016, 08:05 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,776

Rep: Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818
See this in the kernel doco; last section should help if you have ftrace enabled in your kernel.
 
1 members found this post helpful.
Old 01-10-2016, 11:25 PM   #4
berndbausch
Senior Member
 
Registered: Nov 2013
Location: Tokyo
Distribution: Redhat/Centos, Ubuntu, Raspbian, Fedora, Alpine, Cirros, OpenSuse/SLES
Posts: 2,980

Rep: Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775
Some information about kworker and even suggestions for investigating/solving the problem: http://askubuntu.com/questions/33640...ng-so-much-cpu. The discussion is almost five years old but looks still relevant.

There is even a conversation on the kernel mailing list (also five years old): https://lkml.org/lkml/2011/3/31/68.

And more on the internet; invest some time googling.

Last edited by berndbausch; 01-10-2016 at 11:27 PM.
 
1 members found this post helpful.
Old 08-18-2016, 03:17 PM   #5
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 993

Original Poster
Rep: Reputation: 126Reputation: 126
So, I'm back with this old issue

First of all thanks you guys for your hints (were useful as general information but I'm not able to pinpoint the real issue - not able to understand enough about the kernel & filesystem).
Anyway, I did not find the root cause, but since today I HOPE that I have found some kind of workaround/half-fix.
Posting therefore here the informations I have and marking the thread as solved (will immediately "un-solve" it if the issue starts occurring again).

If you're not interested in the details, my curent fix/workaround is...
Code:
echo 10000 > /proc/sys/vm/dirty_writeback_centisecs
...which overwrites the default value of 500.
Why does this (hopefully) fix my issue? No clue.

Details:
1)
Had this issue (high "system"-type CPU usage by "kworker"-processes) for several months, on both my primary Asus notebook (SSD) and my new secondary Dell XPS 13 notebook (NVMe).
2)
Became aware of the issue because I have all the time gkrellm running and showing CPU usage (and disk/wlan activity, temp, etc...) => saw frequent spikes (once every 10-300 or so seconds) lasting between ~0.2 to ~5 seconds.
3)
Reproducing the problem involves a certain degree of "luck" (with the exact same - mostly idle - workload, sometimes/rarely everything worked perfectly for a while, but most of the time the problem persisted), and to increase chances that the problem occurred I usually start Firefox + Vivaldi (or Chrome) + kdevelop. All these apps write a few bytes to disk at regular intervals even if there is no activity (which is in my opinion stupid as it impacts the battery of the notebooks - I tried but I haven't been able to disable it)
Important: I couldn't reproduce the problem by just writing stuff to disk (e.g. with "dd" or copying files around - actually when I did it the problem vanished for a while) => maybe the problem has more to do with appending to file and/or overwriting or something similar?
4)
Whenever the high CPU usage occur, "iotop -o -b | grep -i kworker" shows something like this...
Code:
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
...
 3739 be/4 root        0.00 B/s    0.00 B/s  0.00 %  4.08 % [kworker/1:0]
  925 be/4 root        0.00 B/s    0.00 B/s  0.00 %  3.50 % [kworker/3:1]
 2572 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.54 % [kworker/0:2]
 2881 be/4 root        0.00 B/s    0.00 B/s  0.00 %  4.50 % [kworker/2:0]
 2572 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.87 % [kworker/0:2]
  925 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.54 % [kworker/3:1]
 3165 be/4 root        0.00 B/s    9.18 G/s  0.00 %  0.00 % [kworker/u8:2]
 2881 be/4 root        0.00 B/s    0.00 B/s  0.00 %  4.03 % [kworker/2:0]
 3739 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.46 % [kworker/1:0]
 2572 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.44 % [kworker/0:2]
 3165 be/4 root        0.00 B/s   10.37 G/s  0.00 %  0.00 % [kworker/u8:2]
 2881 be/4 root        0.00 B/s    0.00 B/s  0.00 %  2.88 % [kworker/2:0]
  925 be/4 root        0.00 B/s    0.00 B/s  0.00 %  2.18 % [kworker/3:1]
 3739 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.67 % [kworker/1:0]
 2572 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.53 % [kworker/0:2]
 3165 be/4 root        0.00 B/s    3.62 G/s  0.00 %  0.00 % [kworker/u8:2]
  925 be/4 root        0.00 B/s    0.00 B/s  0.00 %  3.88 % [kworker/3:1]
 3739 be/4 root        0.00 B/s    0.00 B/s  0.00 %  2.22 % [kworker/1:0]
 2881 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.20 % [kworker/2:0]
...
(in the above case I had 3 CPU-usage spikes)
(the "real" amount of bytes that was written to disk aren't the GBs reported above but 0 or just a few KBs - kind of random but in any case always very very little)
...and "perf record -g -a sleep 3" + "perf report" showed this (THIS IS NOT A "PERF" OF THE ABOVE "IOTOP"-OUTPUT - it's for another spike, but I got basically always the same informations shown about writeback etc...):
Code:
# Children      Self  Command         Shared Object                        Symbol                                      
# ........  ........  ..............  ...................................  ............................................
#
     8.06%     8.06%  kworker/u16:0   [kernel.vmlinux]                     [k] radix_tree_next_chunk                   
             |
             ---radix_tree_next_chunk

     6.96%     6.96%  kworker/u16:0   [kernel.vmlinux]                     [k] writeback_sb_inodes                     
             |
             ---writeback_sb_inodes

     6.81%     6.81%  kworker/u16:0   [kernel.vmlinux]                     [k] _raw_spin_lock                          
             |
             ---_raw_spin_lock

     5.86%     5.86%  kworker/u16:0   [kernel.vmlinux]                     [k] write_cache_pages                       
             |
             ---write_cache_pages

     5.72%     5.72%  kworker/u16:0   [kernel.vmlinux]                     [k] dec_zone_page_state                     
             |
             ---dec_zone_page_state

     5.25%     5.25%  kworker/u16:0   [kernel.vmlinux]                     [k] clear_page_dirty_for_io                 
             |
             ---clear_page_dirty_for_io

     4.72%     4.72%  kworker/u16:0   [kernel.vmlinux]                     [k] __mark_inode_dirty                      
             |
             ---__mark_inode_dirty
5)
Because of the above output & test behaviour I ended up focusing on the filesystem.
As I am using "nilfs2" as rootfs (don't want stop using it because I like too much its features of data-checksum and continuous snapshots to recover stuff I delete by mistake) I tried different versions of its userland tools (between 2.1.5-r1 and 2.2.2 which I'm using now), garbage-collection settings (even if the problem occurred when GC was not running nor active) and both no/discard mount options.
I additionally tried other things - multiple kernel versions (kernels 4.1, 4.3 and 4.6 if I remember correctly), tried both "CONFIG_NO_HZ_IDLE" and "CONFIG_NO_HZ_FULL" timer subsystems, fully upgraded twice my Gentoo OS, played with powersave options fully on and fully off, etc... .
6)
Today, after executing "echo 10000 > /proc/sys/vm/dirty_writeback_centisecs" everything became quiet and I had 1 small CPU spike ~45 minutes ago and that's it.
No particular reason why I chose "10000" - was impulsive.
And as well no particular reason why I decided to fiddle around with "dirty_writeback_centisecs" - was again impulsive, indirectly pushed by the above "perf"-output listing stuff related to dirty & writeback... .

Cheers

Last edited by Pearlseattle; 08-18-2016 at 03:30 PM.
 
Old 08-18-2016, 07:14 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,776

Rep: Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818
What does this show ?
Code:
cat /proc/vmstat | grep -E "dirty|writeback"
 
Old 08-22-2016, 03:01 PM   #7
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 993

Original Poster
Rep: Reputation: 126Reputation: 126
So, sorry, it took me a while because for misterious reasons even after undoing the "workaround", the issue didn't show up immediately => had to work a bit with the notebook (surf and program a bit, which is what I did just now) and once it started (and then left the notebook idle) the issue kept on occurring and still is while I'm writing this post (but for sure much less frequently than in the past => maybe I have to use kdevelop a bit more, don't know...).

So, here are the 2 outputs of "iotop + your command" of when I had a few minutes ago a high CPU spike (100% "system" CPU usage on 1 CPU shown by both "gkrellm" and "nmon" for the duration of ~3 seconds) caused by "kworker" (I highlighted the lines which are linked to the time when I had the CPU spike):

"iotop -b -o -t | grep -i kworker"
Code:
...21:23:50  5501 be/4 root        0.00 B/s    0.00 B/s  0.00 %  4.25 % [kworker/1:2]
21:23:50  3947 be/4 root        0.00 B/s    0.00 B/s  0.00 %  2.07 % [kworker/2:1]
21:23:50  6457 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.48 % [kworker/0:3]
21:23:51  5501 be/4 root        0.00 B/s    0.00 B/s  0.00 %  5.19 % [kworker/1:2]
21:23:51  6457 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.48 % [kworker/0:3]
21:23:51  3947 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.43 % [kworker/2:1]
21:23:51 12894 be/4 root        0.00 B/s  599.77 M/s  0.00 %  0.00 % [kworker/u8:0
21:23:52  3947 be/4 root        0.00 B/s    0.00 B/s  0.00 %  4.78 % [kworker/2:1]
21:23:52  6457 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.33 % [kworker/0:3]
21:23:52  5501 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.02 % [kworker/1:2]
21:23:52 12894 be/4 root        0.00 B/s    5.95 G/s  0.00 %  0.00 % [kworker/u8:0
21:23:53  6457 be/4 root        0.00 B/s    0.00 B/s  0.00 %  9.78 % [kworker/0:3]
21:23:53  5501 be/4 root        0.00 B/s    0.00 B/s  0.00 %  4.61 % [kworker/1:2]
21:23:53  1065 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.51 % [kworker/3:2]
21:23:54  6457 be/4 root        0.00 B/s    0.00 B/s  0.00 %  8.37 % [kworker/0:3]
21:23:54  5501 be/4 root        0.00 B/s    0.00 B/s  0.00 %  3.23 % [kworker/1:2]
21:23:55  6457 be/4 root        0.00 B/s    0.00 B/s  0.00 %  4.58 % [kworker/0:3]
21:23:55  5501 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.04 % [kworker/1:2]
21:23:55  3947 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.51 % [kworker/2:1]
...
1-second loops of "date && cat /proc/vmstat | grep -E "dirty|writeback""
Code:
...
Mon Aug 22 21:23:50 CEST 2016
nr_dirty 19
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 147278
nr_dirty_background_threshold 73639
Mon Aug 22 21:23:51 CEST 2016
nr_dirty 19
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 147272
nr_dirty_background_threshold 73636
Mon Aug 22 21:23:52 CEST 2016
nr_dirty 19
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 147265
nr_dirty_background_threshold 73632
Mon Aug 22 21:23:53 CEST 2016
nr_dirty 19
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 147272
nr_dirty_background_threshold 73636
Mon Aug 22 21:23:54 CEST 2016
nr_dirty 19
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 147268
nr_dirty_background_threshold 73634
Mon Aug 22 21:23:55 CEST 2016
nr_dirty 21
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 147269
nr_dirty_background_threshold 73634
Mon Aug 22 21:23:56 CEST 2016
nr_dirty 34
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 148099
nr_dirty_background_threshold 74049
Mon Aug 22 21:23:57 CEST 2016
...
Just fyi:
- I still confirm that "echo 10000 > /proc/sys/vm/dirty_writeback_centisecs" is still really working as workaround because since applying it (as well) on my primary notebook the issue never occurred again (as well with both the same workload I mentioned in this thread).
- Confirming what I have written in my previous post: A) during the whole duration of the whole test I know that I nor some utility/program running in the background have never written GBs nor even MBs to HDD (just a few KBs from time to time) and B) "watch "cat /proc/meminfo | grep -i "dirty\|write""" confirmed this as the values were all the time 0 or to just a few KBs.

Thanks
 
Old 08-22-2016, 09:40 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,776

Rep: Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818Reputation: 2818
Those numbers look just like mine (ext4). I also use btrfs (elsewhere) and have never seen anything like this - although it has been a while since I used btrfs as rootfs. Given the following quote, I might be inclined to raise a bug against NILFS and see what their devs have to say.
Quote:
Originally Posted by Pearlseattle View Post
As I am using "nilfs2" as rootfs (don't want stop using it because I like too much its features of data-checksum and continuous snapshots to recover stuff I delete by mistake) I tried different versions of its userland tools (between 2.1.5-r1 and 2.2.2 which I'm using now), garbage-collection settings (even if the problem occurred when GC was not running nor active) and both no/discard mount options.
I have no idea why the change to dirty_writeback_centisecs has any effect given the dirty counts you see.
 
Old 08-23-2016, 07:45 AM   #9
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 993

Original Poster
Rep: Reputation: 126Reputation: 126
Well, thanks anyway
Yes, in the future I'll probably have a deeper look into nilfs2, whenever I feel like it & have some time.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] [mplayer] Do cpu utilization is high kworker to die? TeaYoung Linux - Software 3 11-24-2015 12:58 PM
High CPU load by kworker sagatov Linux - Kernel 3 10-10-2014 03:40 AM
Squid 3 near 100% cpu usage and high RAM usage piman Linux - Software 1 11-16-2013 02:20 AM
mysql server not responding with high cpu usage and high load avgs landysaccount Linux - Server 2 09-15-2013 03:46 AM
[SOLVED] High CPU load, but low CPU usage (high idle CPU) baffy Linux - Newbie 5 03-13-2013 09:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration