[SOLVED] CPU overheating with recent 14.2 kernel updates
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I was only concerned about your observation related to the microcode, now proved to be transient and incorrect, provided some details and an alternative way to load it (huge kernel and early loading method), fearing that something could have went south with your initially used method (generic kernel + prepared initrd).
It is also my understanding that the microcode is only loaded if there is an actual microcode update available in the bundle matching the exact CPU sig= and pf_mask=
You can easily check if a microcode is loaded by comparing the microcode field from /proc/cpuinfo when booted without a microcode available and then with one available.
Another way, a concrete example on a very old Atom system I load the huge kernel with initrd=/boot/intel-ucode.cpio and the microcode gets successfully loaded/updated to 0x0218:
Sorry, cannot provide more help with your issue since I'm not sure (neither are you) if it's SW or HW related. On HW, an interesting lecture (just for fun /or not?): https://spectrum.ieee.org/semiconduc...ansistor-aging
@baumei
I didn't check when astrogeek's CPU was manufactured but only focused on the microcode loading method, for reasons expressed above.
The only source I could find about the release date for the T2050 is Wiki: https://en.wikipedia.org/wiki/List_o...h%22_(65_nm)_2
It's indeed May 2006 and I presume Intel delivered this CPU to OEM manufacturers & HW architects before the official "commercial" release, that would explain the 2005-11-15 microcode update date.
My oldest machine with the same CPU die as yours exhibits a related problem. On the coldest days (ambient temperature <=50F) the fan will not turn on even after the CPU has overheated and throttled. I think this is some kind of BIOS/ACPI problem in which the BIOS tries to get the CPU into a normal operating temperature range, but somehow gets stuck. Rebooting the machine after it has warmed up restores normal fan control.
The machine has always had this problem. I watch for it on winter days.
Ed
Interesting!
I have been using this laptop daily since fall of 2017 and have not observed this problem. However, I am now using it in a different physical location since late summer this year, so if it has some sensitivity to ambient temperature that may explain why I have not seen it before now. The other Toshiba I mentioned was similar in many ways but had a different, single core cpu and never exhibited this problem.
I had an unused partition on the drive so installed a fresh Slackware 14.2 earlier as a means of most quickly deciding whether the apparent correlation with recent kernel updates was real or not - and the verdict is, not! On a fresh 14.2 with 4.4.14 huge kernel I see similar thermal behavior on first boot, with it settling in about 58C idle.
So with that data and your post above I must now say that it may be BIOS/ACPI or hardware related after all, and that the apparent correlation with kernel version or microcode simply resulted from bad timing and few data points!
I am still curious enough about the cause to be inclined to look a little further, but for the purposes of this thread I think I am satisfied for now and will mark it solved to avoid wasting other's time. If I find anything new and relevant I will post it here.
If this is a laptop that's been used for 2 years, it may have sucked up dust, hair, and other debris and it may be clogging the heatsink. I found I typically need to get into my laptops about every 1-2 years to clean them up...
The computer I am currently using is of similar vintage to yours, and also has an Intel processor.
I expect running the following command on your computer will give a result similar to that from mine.
Code:
astrogeek@darkstar:~# lsmod | less
If so, then I expect there will be three modules listed which may be interesting: coretemp, thermal, and hwmon.
The following command will give about 15 lines of information regarding the "thermal" module.
Code:
astrogeek@darkstar:~# /sbin/modinfo thermal
The module "thermal" appears to be interface code between the kernel and the ACPI system.
From the output of "lsmod" I see that "thermal" has no other modules which depend upon it.
If you are interested in a stab-in-the-dark, it may be interesting to see what removing the module, followed by inserting the module, does for the temperature behavior of your computer. :-)
Thank you for the information about ucode_tool and the Intel Microcode. I found this very interesting, and am going to get copies from Slackbuilds. :-)
If this is a laptop that's been used for 2 years, it may have sucked up dust, hair, and other debris and it may be clogging the heatsink. I found I typically need to get into my laptops about every 1-2 years to clean them up...
Thanks, but as mentioned I had done those things, including new heatsink compound before posting the question.
I usually do that about once a year too, but had not done so until now because this one had been used very little until the past couple of years, and requires substantial disassembly.
If you are interested in a stab-in-the-dark, it may be interesting to see what removing the module, followed by inserting the module, does for the temperature behavior of your computer. :-)
I had stabbed into that darkness, actually - with the result that nothing changed notably, and the machine continued to operate much the same with or without the thermal module, and with removal and reloading.
In fact, that raised the question not yet answered of whether in the absence of kernel fan control there is fallback to BIOS or hardware control? It seems to me that such a critical subsystem should really be hardware based anyway, and not dependent on the software installed. Imagine if your car's cooling system depended on regular, periodic intervention by the driver to keep it from overheating...
Thanks, but as mentioned I had done those things, including new heatsink compound before posting the question.
Sorry, there was a lot of text on this thread and I didn't notice that. I did search for dust on the pages, but I guess I should've looked for clean or heatsink...
I notice in the initial post you mostly say the trouble came along with the kernel upgrade to 4.4.199-smp. From this and other hints I gather you are running Slackware 14.2. :-)
Earlier this year I had a problem on a 32-bit laptop running Slackware 14.2. With generic kernel 4.4.157-smp everything was great. The next kernel package was in the 4.4.172 series. With generic kernel 4.4.172-smp the screen would go black about halfway through the bootup process. Skipping a long story, the work-around I found was to install from the then current set of 4.19.x kernel packages in '-current'. Not only did the screen resume working properly, but some of the interactions of the kernel and ACPI and the embedded controller worked better.
For my laptop the 4.19.x packages were a drop-in fit to 14.2; I did not have to change anything to make them work together. You may want to give this a whirl on your laptop. :-)
Does your laptop have the most recent BIOS update available?
Last edited by baumei; 11-17-2019 at 10:44 PM.
Reason: typos
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.