Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
06-11-2022, 02:03 PM
|
#1
|
Member
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44
Rep:
|
CPU core temperatures rise and fall impossibly fast. Normal, or bad CPU?
Hi all. I built a new PC last month because my old one died. The new one has an Intel i911900K processor running Debian 64 bit version 11.3.0.
It runs great, not a single problem so far. And, it's not overclocked at all. However, I hear the case fans speed up and slow down constantly. I can do a simple Google search and the fans rev up to full speed, then go back down within seconds.
I thought maybe it was in response to the current the CPU was drawing, because the temperature simply could not rise and fall so fast.
But, today I tried a test. I watched the temperatures of the 8 cores using "watch" and "lm-sensors" and ran it at a 0.1 second rate (that is, a 10 times per second refresh rate). Idling, all the cores run around 30°C.
Then I started 8 instances of "cat /dev/urandom > /dev/null" and wanted to see how high the temperature would get.
Amazingly, all 8 cores shot up to 95...100°C in 3/4 of a second! I recorded and graphed the data so I didn't have to guess at the timeframe. The thermal mass of JUST THE CPU alone simply couldn't jump up by 70°C that fast.
I asked around some "overclocker" forums figuring they would know all the "thermal" stuff. But, the replies were all idiotic things like "you need water cooling" or "dude, that Intel cooler is your problem, they suck" or "do you use thermal paste?" or "is the heatsink clogged with dust?" (yeah, on a brand new build only a month old) or the best one yet "try liquid nitrogen". Totally clueless overclocker "gamerz".
Anyway, I have a good copper core Intel (Foxconn) cooler and it only ever gets warm. I thought maybe it didn't get hotter because of poor heat transfer from the CPU (which is unlikely since I use home made "EGain" (eutectic Gallium / Indium alloy)) as a thermal interface material (kind of like "liquid metal Galinstan" but without the tin). If I disconnect the fan of the CPU cooler, the heatsink gets hot at what seems to be an appropriate rate (about 4 minutes from luke warm to "OUCH!"), and I re-connect the fan when the heatsink reaches 100°C and lots of hot air is generated, and it cools back down to normal in just a few minutes.
The reported CPU temperature and measured temperature agree closely when it heats at idle (without a constantly changing load). This tells me I have very good thermal conductivity between CPU and heatsink. Because the thermal mass of the heatsink is also in play, there is NO WAY the temperature could increase by almost 70°C in less than a second!
So, do any of you have any info as to what could be causing this? My only guess (and fear) is that the CPU die internally is not bonded properly (or at all) to the heat spreader (the square metal top of the CPU).
I am anxious to find out if this is normal or a bad CPU. I can't imagine how the darn thing can jump up by 70°C in less than a second. Oh, BTW, it also COOLS rapidly after the load goes away. It will drop back down to around 30°C within around 5 seconds. During all this, the CPU heatsink just sits there cozy warm. It doesn't change temperature at all (measured) while the CPU is having it's tantrums.
Any ideas? I will greatly appreciate it. Thanks!
|
|
|
06-11-2022, 05:37 PM
|
#2
|
LQ Guru
Registered: Oct 2004
Distribution: Arch
Posts: 5,386
|
Quote:
Amazingly, all 8 cores shot up to 95...100°C in 3/4 of a second!
|
That does not sound right. You can get an 8 core cpu up to 212F, such as when using ffmpeg, but it takes 30 seconds or so. And, if it really were that hot, I would stop and see what was wrong. Or use something like cpupower to limit the max frequency.
Either:
CPU heat sink, paste, fan is not working.
CPU heat sink is full of dust.
Sensors are not being read correctly.
I suppose that it is possible that the manufacturer left off the cpu heat pad/paste between the processor and the heat sink.
Check into how those sensors are being read, and if you need to chance the config file.
I can give some general info
https://wiki.archlinux.org/title/Lm_sensors
|
|
1 members found this post helpful.
|
06-12-2022, 10:45 AM
|
#3
|
Senior Member
Registered: Oct 2003
Posts: 3,016
|
First, I wouldn't trust lm-sensors to be accurate with any newish cpu. Record your idle temps with lm-sensors and then boot into your bios setup and let things settle down and see what the bios idle temps are. On my intel i7 12700K bios temps are about 8 to 10 degrees higher than the idle temps reported by lm-sensors.
Next, just to eliminate any potential kernel regression with regard to your hardware, you might want to get a livecd with a more current kernel and attempt to run your stress tests after booting with it and see if you get the same result. It's very unlikely to be an issue but I would want to eliminate this remote possibility before going further.
Those are the only things I have to add to what teckk posted above.
|
|
|
06-12-2022, 04:51 PM
|
#4
|
Member
Registered: Jan 2022
Location: Hanover, Germany
Distribution: Slackware
Posts: 309
Rep: 
|
Quote:
Originally Posted by Krupski
Because the thermal mass of the heatsink is also in play, there is NO WAY the temperature could increase by almost 70°C in less than a second!
|
Consider about thermal capacity of heat sink. Capacities gives time delays. Therefore it might be possible that temperature measured inside the die increase by almost 70°C in less than a second.
If an OEM heatsink is built in your PC this is very bad. These OEM heatsinks gives poor cooling performance. They only have to ensure that CPU survives warranty period.
|
|
|
06-13-2022, 09:32 AM
|
#5
|
Member
Registered: Jun 2020
Posts: 610
Rep: 
|
What specific heatsink are you using here? The Intel OEM heatsinks are not actually rated for 11900k (the baseline TDP (125W) is higher than their rated capacity (which is usually 60-90W depending on specific configuration)), and aren't included with K SKUs for a reason. That kind of rapid thermal rise isn't surprising under extreme/sustained loads if that's the case (and your fingers do not a thermal probe make). This is not at all odd behavior for the higher wattage Intel (or AMD) chips IME. The 11th gen desktop chips also do have a reputation for running fairly warm even among newer chips.. Anandtech actually did a review of the Intel heatsinks, which you can read here: https://www.anandtech.com/show/10500...d-vs-evo-212/3 ("how did they get old heatsinks on new CPUs?" they didn't - they used a dummy load), and none of them are particularly 'good' at handling heavier loads.
There is an easy/free fix to be had, however: go into the system BIOS and set lower long-term and short-term power limits (the non-K SKUs will have long-term set at 65W, as opposed to 65W (or 241W for 12th gen (!!)), and short-term between 125W and 241W with Tau reduced to 28 seconds (from ~60 on K, or (effectively) infinite on 12th gen)). The performance implications of this will be negligible-to-minor depending on your exact workload - this has been tested time and time again by sites like TechPowerUp over the last few generations of Intel chips ("so why do they set such a higher power limit if it doesn't make sense, surely it must have a reason!" -> to win at artificial benchmark competitions that are common in more gamer-oriented CPU reviews, in applications like Cinebench, Geekbench, Futuremark, etc where they can say "well its running stock so it isn't cheating" but 'stock' doesn't always mean 'sane' these days). The less easy/free fix would be to get a proper heatsink for a ~150-200W load - Thermalright, Noctua, DeepCool, etc all make big heatsinks that can handle that, or you can get a pre-filled liquid cooler (which is likely in-line with the suggestions you've gotten elsewhere) for around the same price, which may or may not be easier to mount in your system. I'd probably skip the workbench concoctions and use their included TIM as well.
|
|
|
06-14-2022, 05:53 AM
|
#7
|
Member
Registered: Jun 2020
Posts: 610
Rep: 
|
Also note that Intel's 'generations' have started to diverge between mobile and desktop - the desktop '11th gen' chips are a different architecture/platform than on mobile (they have different code names too). You can see some of this difference if you compare a pair of chips from either side, such as here: https://www.cpu-world.com/Compare/30...i7-11700K.html
Note the different GPUs, and some differences in cache and instructions. Apart from the GPU, I'm not sure how noticeable this would be for most users, but it is worth bearing in mind that the 'generation' branding is becoming less concrete at demarcating feature sets. 
|
|
|
06-14-2022, 07:45 AM
|
#8
|
Member
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44
Original Poster
Rep:
|
Quote:
Originally Posted by teckk
That does not sound right. You can get an 8 core cpu up to 212F, such as when using ffmpeg, but it takes 30 seconds or so. And, if it really were that hot, I would stop and see what was wrong. Or use something like cpupower to limit the max frequency.
Either:
CPU heat sink, paste, fan is not working.
CPU heat sink is full of dust.
Sensors are not being read correctly.
I suppose that it is possible that the manufacturer left off the cpu heat pad/paste between the processor and the heat sink.
Check into how those sensors are being read, and if you need to chance the config file.
I can give some general info
https://wiki.archlinux.org/title/Lm_sensors
|
Did you read my post?
|
|
|
06-14-2022, 07:49 AM
|
#9
|
Member
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44
Original Poster
Rep:
|
Quote:
Originally Posted by kilgoretrout
First, I wouldn't trust lm-sensors to be accurate with any newish cpu. Record your idle temps with lm-sensors and then boot into your bios setup and let things settle down and see what the bios idle temps are. On my intel i7 12700K bios temps are about 8 to 10 degrees higher than the idle temps reported by lm-sensors.
Next, just to eliminate any potential kernel regression with regard to your hardware, you might want to get a livecd with a more current kernel and attempt to run your stress tests after booting with it and see if you get the same result. It's very unlikely to be an issue but I would want to eliminate this remote possibility before going further.
Those are the only things I have to add to what teckk posted above.
|
Even if lm-sensors is not accurate, the readings (although possibly wrong) should not CHANGE by such a large amount.
As far as a more current kernel, I'm running Debian 11.3.0 which is the latest one out. I don't think the kernel could be newer unless I compiled it myself...?
|
|
|
06-14-2022, 07:57 AM
|
#10
|
Member
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44
Original Poster
Rep:
|
Quote:
Originally Posted by Arnulf
Consider about thermal capacity of heat sink. Capacities gives time delays. Therefore it might be possible that temperature measured inside the die increase by almost 70°C in less than a second.
If an OEM heatsink is built in your PC this is very bad. These OEM heatsinks gives poor cooling performance. They only have to ensure that CPU survives warranty period.
|
I built my own machine. There is no "warranty period".
Besides, I don't believe that ANY heatsink, no matter how well it is thermally bonded to the CPU package, could have any effect on a temperature that jumps by 70°C in less than a second. The heat simply could not flow that rapidly.
A guy I know thought of a possibility that actually does make sense to me. He said maybe the on-chip temperature sensors being part of the CPU silicon, respond to heat generated in the die because the sensor and core are literally only a few hundred nanometers apart, and show a change long before the heat can flow to the IHS, let alone the heatsink. Maybe?
|
|
|
06-14-2022, 09:12 AM
|
#11
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,424
|
If it's a Zen 3 AMD CPU, it might be ok. They are using semiconductor junctions on the die itself as heat sensors. response is nearly instant. It's a known technique. The junction is non linear, but voltage falls as temperature rises. Voltage --> temperature conversion can be by lookup table, or equation.
I don't know how intel are doing, but I presume they're not far behind. If a temperature gradient is rising nearly vertically, it makes good sense to exaggerate this in software to call in thermal limiting before it's needed. Semiconductors can't take overheating like metal or plastic. Your cpu would die dead well within 100µS of overheating. Heatsinking has only minimal effect with reactions of that speed.
So if the temprature gradient was steep, and the figure was enhanced to trigger protection in time, you could well see the crazy figures we have trouble believing.
|
|
|
06-14-2022, 10:08 AM
|
#12
|
Member
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44
Original Poster
Rep:
|
Quote:
Originally Posted by business_kid
If it's a Zen 3 AMD CPU, it might be ok. They are using semiconductor junctions on the die itself as heat sensors. response is nearly instant. It's a known technique. The junction is non linear, but voltage falls as temperature rises. Voltage --> temperature conversion can be by lookup table, or equation.
I don't know how intel are doing, but I presume they're not far behind. If a temperature gradient is rising nearly vertically, it makes good sense to exaggerate this in software to call in thermal limiting before it's needed. Semiconductors can't take overheating like metal or plastic. Your cpu would die dead well within 100µS of overheating. Heatsinking has only minimal effect with reactions of that speed.
So if the temprature gradient was steep, and the figure was enhanced to trigger protection in time, you could well see the crazy figures we have trouble believing.
|
I stated in the original post that the CPU is an Intel i911900K.
|
|
|
06-14-2022, 11:28 AM
|
#13
|
Senior Member
Registered: Oct 2003
Posts: 3,016
|
Quote:
As far as a more current kernel, I'm running Debian 11.3.0 which is the latest one out. I don't think the kernel could be newer unless I compiled it myself...?
|
Debian 11 stable was released in August 2021 with kernel 5.10 which was released in December 2020. Debian generally will only backport security patches to its kernels in the stable branch. Your processor was released by Intel in Q1 of 2021. Basically, you are using a kernel which may predate the issuance of your cpu and may not contain all the necessary patches to give you optimal performance with your cpu. That's the problem with using newer hardware with Debian stable. I'm not suggesting that you compile your own current kernel. But you could get a livecd with just about any other current distro not based on Debian stable and have a much more current kernel then you presently have. You could then boot into your livecd and run your tests and see if you get the same results.
By way of example, on my new build which has an Intel core i7 12700K cpu I have several distros installed including Debian 11 and Arch. For benchmark purposes, I wrote a little program in C which calculates all the primes between zero and two billion using the sieve of Eratosthenes. On Debian 11 with kernel 5.10.0-15 the program takes 73 seconds to run. On Arch with kernel 5.18.2 the same program takes 46 seconds to run. This just illustrates that using an older kernel on newer hardware can have a significant performance impact and there may be other issues as well.
|
|
|
06-14-2022, 12:27 PM
|
#14
|
Senior Member
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 1,058
|
Quote:
Originally Posted by Krupski
A guy I know thought of a possibility that actually does make sense to me. He said maybe the on-chip temperature sensors being part of the CPU silicon, respond to heat generated in the die because the sensor and core are literally only a few hundred nanometers apart, and show a change long before the heat can flow to the IHS, let alone the heatsink. Maybe?
|
Thermal networks are modeled similarly to electrical resistor-capacitor networks.
Think of a thermal network as a network of series-connected resistors, with each node having a capacitor to ground. A thermal network is characterized by thermal resistance and thermal time constants, which are analogous to electrical resistance and electrical RC time constants.
The temperature can change quite rapidly at the CPU die, but only over a limited range. Eventually, the heat propagates to the slower heat spreader and even slower fansink.
On one of my machines, the CPU die temperature can change by 30C instantly. The next 30C delta takes minutes.
Ed
|
|
|
06-23-2022, 07:47 PM
|
#15
|
Member
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44
Original Poster
Rep:
|
Well everyone, I found the problem. The die thermal sensors are a part of each core, so when a core gets busy (i.e. draws more power), the temperature as seen by the thermal sensor diode rises almost instantly. When the load goes down, the extra heat flows into the IHS and then out of the heatsink. So, this effect is inherent in the processor design and not a real "overheat". A better way to measure temperature would be to install a temperature sensor either on the IHS (via a machined groove to accomodate the sensor and wire), or installed on the heatsink, away from cooling fan airflow. But, not to worry... a simple touch of the heatsink with a finger is all you need... too hot to touch, too hot to use. Watch out for those fan blades. Blood does not make a good coolant! 
|
|
|
All times are GMT -5. The time now is 01:47 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|