[SOLVED] system lockups after upgrade to kernel 4.1.6
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have Thinkpad W520 machine, and I'm tracking -current (pure 64-bit Slackware install here). Ever since last big -current update back in August, when kernel 4.1.6 get installed, Linux crashes on this machine occasionally (sometimes several times a day). At first, I was thinking that it may have to do with the fact that I'm using NVIDIA binary driver, but in the meantime driver get updated several times, as well as several other suspicious packages (Firefox, Flash player, etc.), and the machine is still crashing. Most of the time, __alloc_pipe_info() would appear in the call trace in console, but for the last crash I succeeded in saving dmesg output to a file (sometimes, X would crash first, and shortly afterwards the system would lock down), and the call trace is somewhat different - I'm attaching it to this post.
So: any ideas on how to approach debugging this issue? Also, does anyone know when we may get a kernel update in -curent? I was hesitant to do something about this as I was hoping all the time for a kernel update, that I would like to try first, but three months passed and -current is still on the same kernel version...
There is no guarantee that the kernel will get updated before the next final release of Slackware. I can't look at the log here (and I'm not even sure I could help by looking at the log), but you should be able to build a newer 4.1.x kernel using the .config from your system with ease. This might help you determine if it is the kernel causing the problem. Many Slackware users run kernels other than the official (me included, since I'm running 3.18.8 on my 14.1 install), and Slackware will work fine with newer kernels.
You're running the most current BIOS for your model too, and I see "CTO" on your machine which means a customized Lenovo.
Did you do a --install-new before --upgrade-all?
Also this cannot be good
Code:
[ 14.708383] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 14.708384] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 3039.363925] CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 3039.363926] CPU2: Core temperature above threshold, cpu clock throttled (total events = 1)
@bassmadrigal: No problem to use kernel built from scratch. I actually use Slackware for number of years, and during a rather long period I was building kernels from scratch, with minimum possible options turned on to keep it lean, all the time. However, at some point you just get lazy and switch to using pre-built kernels. Main obstacle for trying this route was actually in how to get back to using kernels provided in -current again when I'm done with this testing... I guess I'll have to investigate how to build Slackware kernel packages.
@Aizenmyou: Yes, I'm always doing "slackpkg install-new" before "slackware upgrade-all" (and then "slackpkg clean-system" afterwards).
In the meatime, I've un-installed NVIDIA driver, and enabled Nouveau driver (that was black-listed initially), in order to check will the system keep locking up.
Also, regarding "mce: [Hardware Error]: Machine check events logged" errors that appear in dmesg output sometimes, I was able to run mcelog immediately after spotting one of these, and here is the mcelog output:
--------
mcelog: Unsupported new Family 6 Model 2a CPU: only decoding architectural errors
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 15
CPU 5 THERMAL EVENT TSC 556b5758f98
TIME 1445539128 Thu Oct 22 20:38:48 2015
Processor 5 below trip temperature. Throttling disabled
STATUS 88040002 MCGSTATUS 0
MCGCAP c09 APICID 5 SOCKETID 0
CPUID Vendor Intel Family 6 Model 42
--------
So I guess this particular issue is not that serious after all.
I guess I'll have to investigate how to build Slackware kernel packages.
There's really no need to worry about packages if you don't want to with the kernel. They're pretty standard to install (and remove, if desired). And if you keep your lilo entries for the current kernel and just add entries for the newer kernel, it's super easy to go back. If you want to completely remove the newer kernel, just remove the source directory, the bzImage file (usually in /boot), your kernel's module directory under /lib/modules/, and finally, remove the entry from your lilo.conf
Forum member ryanpcmcquen even created a script to easily build a kernel and update lilo, if you don't feel like doing it by hand.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.