[SOLVED] Does linux kernel try to recover from CPU stall?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Does Linux OS have any way to recover from problems that stalled CPU causes. By stalling CPU I mean a kernel thread that is not yielding a CPU in a non preemptible kernel (possibly a buggy kernel module).
Problems it may cause include lost CPU resource if it is a buggy process not doing any useful work. So does Linux forcefully yield such kernel thread or kill such kernel thread if it detects it is stalling for long long time
Some other problems I faced when one kernel module stalls CPU is that some functionality doesn't work. e.g., sudo,echo didn't work for me.
I am running a 2-core UBUNTU machine with 2 GB ram and ran a kernel module that does while(1)
Watchdog helps in detection of a soft lockup in CPU, but I want to know what Linux does when it detects there is a soft lockup causing by some kthread.
The traditional watchdog throws the cpu a reset (back in the primitive era). In primitive era industrial controllers, that was fine. I presume the kernel watchdog is a bit more sophisticated, but you'd want to read the options you have. You may also be able to do something by regularly checking top and grepping for that particular process; then kill it if it gets too busy.
Can't you restrain it with nice? From the problems you mention I gather you're writing software, not running a server? If you're having lockups, you may need a watchdog card, to have some monitoring intelligence at your disposal. There are some kernel hot keys, and you should look them up.
I have beeun enjoying fedora 29 recently on my Toshiba laptop. Plenty of memory and HD space. I did a sudo dnf update as I often do, however this time it does not finish. It seems to be doing the update but hangs on the "Watchdog did not finish". I get this message when I am forced to hit CTL_ALT_DEL to restart. At that point I see a quick message appear briefly "Startin Hold until boot process finishes up... " - Previous updates have worked fine.
Not sure how I can get by this hold.
Suggestions or ideas would be appreciated.
~ Ron
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.