LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 06-11-2022, 02:03 PM   #1
Krupski
Member
 
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44

Rep: Reputation: 0
CPU core temperatures rise and fall impossibly fast. Normal, or bad CPU?


Hi all. I built a new PC last month because my old one died. The new one has an Intel i911900K processor running Debian 64 bit version 11.3.0.

It runs great, not a single problem so far. And, it's not overclocked at all. However, I hear the case fans speed up and slow down constantly. I can do a simple Google search and the fans rev up to full speed, then go back down within seconds.

I thought maybe it was in response to the current the CPU was drawing, because the temperature simply could not rise and fall so fast.

But, today I tried a test. I watched the temperatures of the 8 cores using "watch" and "lm-sensors" and ran it at a 0.1 second rate (that is, a 10 times per second refresh rate). Idling, all the cores run around 30°C.

Then I started 8 instances of "cat /dev/urandom > /dev/null" and wanted to see how high the temperature would get.

Amazingly, all 8 cores shot up to 95...100°C in 3/4 of a second! I recorded and graphed the data so I didn't have to guess at the timeframe. The thermal mass of JUST THE CPU alone simply couldn't jump up by 70°C that fast.

I asked around some "overclocker" forums figuring they would know all the "thermal" stuff. But, the replies were all idiotic things like "you need water cooling" or "dude, that Intel cooler is your problem, they suck" or "do you use thermal paste?" or "is the heatsink clogged with dust?" (yeah, on a brand new build only a month old) or the best one yet "try liquid nitrogen". Totally clueless overclocker "gamerz".

Anyway, I have a good copper core Intel (Foxconn) cooler and it only ever gets warm. I thought maybe it didn't get hotter because of poor heat transfer from the CPU (which is unlikely since I use home made "EGain" (eutectic Gallium / Indium alloy)) as a thermal interface material (kind of like "liquid metal Galinstan" but without the tin). If I disconnect the fan of the CPU cooler, the heatsink gets hot at what seems to be an appropriate rate (about 4 minutes from luke warm to "OUCH!"), and I re-connect the fan when the heatsink reaches 100°C and lots of hot air is generated, and it cools back down to normal in just a few minutes.

The reported CPU temperature and measured temperature agree closely when it heats at idle (without a constantly changing load). This tells me I have very good thermal conductivity between CPU and heatsink. Because the thermal mass of the heatsink is also in play, there is NO WAY the temperature could increase by almost 70°C in less than a second!

So, do any of you have any info as to what could be causing this? My only guess (and fear) is that the CPU die internally is not bonded properly (or at all) to the heat spreader (the square metal top of the CPU).

I am anxious to find out if this is normal or a bad CPU. I can't imagine how the darn thing can jump up by 70°C in less than a second. Oh, BTW, it also COOLS rapidly after the load goes away. It will drop back down to around 30°C within around 5 seconds. During all this, the CPU heatsink just sits there cozy warm. It doesn't change temperature at all (measured) while the CPU is having it's tantrums.

Any ideas? I will greatly appreciate it. Thanks!
 
Old 06-11-2022, 05:37 PM   #2
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,386
Blog Entries: 7

Rep: Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948
Quote:
Amazingly, all 8 cores shot up to 95...100°C in 3/4 of a second!
That does not sound right. You can get an 8 core cpu up to 212F, such as when using ffmpeg, but it takes 30 seconds or so. And, if it really were that hot, I would stop and see what was wrong. Or use something like cpupower to limit the max frequency.

Either:

CPU heat sink, paste, fan is not working.

CPU heat sink is full of dust.

Sensors are not being read correctly.

I suppose that it is possible that the manufacturer left off the cpu heat pad/paste between the processor and the heat sink.

Check into how those sensors are being read, and if you need to chance the config file.

I can give some general info
https://wiki.archlinux.org/title/Lm_sensors
 
1 members found this post helpful.
Old 06-12-2022, 10:45 AM   #3
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 3,016

Rep: Reputation: 399Reputation: 399Reputation: 399Reputation: 399
First, I wouldn't trust lm-sensors to be accurate with any newish cpu. Record your idle temps with lm-sensors and then boot into your bios setup and let things settle down and see what the bios idle temps are. On my intel i7 12700K bios temps are about 8 to 10 degrees higher than the idle temps reported by lm-sensors.

Next, just to eliminate any potential kernel regression with regard to your hardware, you might want to get a livecd with a more current kernel and attempt to run your stress tests after booting with it and see if you get the same result. It's very unlikely to be an issue but I would want to eliminate this remote possibility before going further.
Those are the only things I have to add to what teckk posted above.
 
Old 06-12-2022, 04:51 PM   #4
Arnulf
Member
 
Registered: Jan 2022
Location: Hanover, Germany
Distribution: Slackware
Posts: 309

Rep: Reputation: 112Reputation: 112
Quote:
Originally Posted by Krupski View Post
Because the thermal mass of the heatsink is also in play, there is NO WAY the temperature could increase by almost 70°C in less than a second!
Consider about thermal capacity of heat sink. Capacities gives time delays. Therefore it might be possible that temperature measured inside the die increase by almost 70°C in less than a second.

If an OEM heatsink is built in your PC this is very bad. These OEM heatsinks gives poor cooling performance. They only have to ensure that CPU survives warranty period.
 
Old 06-13-2022, 09:32 AM   #5
obobskivich
Member
 
Registered: Jun 2020
Posts: 610

Rep: Reputation: Disabled
What specific heatsink are you using here? The Intel OEM heatsinks are not actually rated for 11900k (the baseline TDP (125W) is higher than their rated capacity (which is usually 60-90W depending on specific configuration)), and aren't included with K SKUs for a reason. That kind of rapid thermal rise isn't surprising under extreme/sustained loads if that's the case (and your fingers do not a thermal probe make). This is not at all odd behavior for the higher wattage Intel (or AMD) chips IME. The 11th gen desktop chips also do have a reputation for running fairly warm even among newer chips.. Anandtech actually did a review of the Intel heatsinks, which you can read here: https://www.anandtech.com/show/10500...d-vs-evo-212/3 ("how did they get old heatsinks on new CPUs?" they didn't - they used a dummy load), and none of them are particularly 'good' at handling heavier loads.

There is an easy/free fix to be had, however: go into the system BIOS and set lower long-term and short-term power limits (the non-K SKUs will have long-term set at 65W, as opposed to 65W (or 241W for 12th gen (!!)), and short-term between 125W and 241W with Tau reduced to 28 seconds (from ~60 on K, or (effectively) infinite on 12th gen)). The performance implications of this will be negligible-to-minor depending on your exact workload - this has been tested time and time again by sites like TechPowerUp over the last few generations of Intel chips ("so why do they set such a higher power limit if it doesn't make sense, surely it must have a reason!" -> to win at artificial benchmark competitions that are common in more gamer-oriented CPU reviews, in applications like Cinebench, Geekbench, Futuremark, etc where they can say "well its running stock so it isn't cheating" but 'stock' doesn't always mean 'sane' these days). The less easy/free fix would be to get a proper heatsink for a ~150-200W load - Thermalright, Noctua, DeepCool, etc all make big heatsinks that can handle that, or you can get a pre-filled liquid cooler (which is likely in-line with the suggestions you've gotten elsewhere) for around the same price, which may or may not be easier to mount in your system. I'd probably skip the workbench concoctions and use their included TIM as well.
 
Old 06-13-2022, 10:11 AM   #6
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,386
Blog Entries: 7

Rep: Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948Reputation: 1948
@obobskivich, I don't have a 11th gen yet. That's good info.
https://en.m.wikipedia.org/wiki/Intel_processors
https://en.m.wikipedia.org/wiki/Inte...eneration_Core

Good page to make a pdf out of.

Also here:
Laptop
https://www.intel.com/content/dam/su...omparsion.xlsx

Desktop
https://www.intel.com/content/dam/su...son-Chart.xlsx

Laptop/Desktop Intel-Core-Comparsion.xlsx (96k)/(576k)
Code:
agent="Mozilla/5.0 (iPhone; CPU iPhone OS 15_0_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/78.0.257670029 Mobile/19A348 Safari/604.1"

curl -LA "$agent" https://www.intel.com/content/dam/support/us/en/documents/processors/core/Intel-Core-Comparsion.xlsx -o Intel-Laptop-Core-Comparsion.xlsx

curl -LA "$agent" https://www.intel.com/content/dam/support/us/en/documents/processors/Intel-Core-Desktop-Boxed-Processors-Comparison-Chart.xlsx -o Intel-Desktop-Core-Comparsion.xlsx

Last edited by teckk; 06-13-2022 at 10:12 AM.
 
Old 06-14-2022, 05:53 AM   #7
obobskivich
Member
 
Registered: Jun 2020
Posts: 610

Rep: Reputation: Disabled
Also note that Intel's 'generations' have started to diverge between mobile and desktop - the desktop '11th gen' chips are a different architecture/platform than on mobile (they have different code names too). You can see some of this difference if you compare a pair of chips from either side, such as here: https://www.cpu-world.com/Compare/30...i7-11700K.html

Note the different GPUs, and some differences in cache and instructions. Apart from the GPU, I'm not sure how noticeable this would be for most users, but it is worth bearing in mind that the 'generation' branding is becoming less concrete at demarcating feature sets.
 
Old 06-14-2022, 07:45 AM   #8
Krupski
Member
 
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by teckk View Post
That does not sound right. You can get an 8 core cpu up to 212F, such as when using ffmpeg, but it takes 30 seconds or so. And, if it really were that hot, I would stop and see what was wrong. Or use something like cpupower to limit the max frequency.

Either:

CPU heat sink, paste, fan is not working.

CPU heat sink is full of dust.

Sensors are not being read correctly.

I suppose that it is possible that the manufacturer left off the cpu heat pad/paste between the processor and the heat sink.

Check into how those sensors are being read, and if you need to chance the config file.

I can give some general info
https://wiki.archlinux.org/title/Lm_sensors
Did you read my post?
 
Old 06-14-2022, 07:49 AM   #9
Krupski
Member
 
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by kilgoretrout View Post
First, I wouldn't trust lm-sensors to be accurate with any newish cpu. Record your idle temps with lm-sensors and then boot into your bios setup and let things settle down and see what the bios idle temps are. On my intel i7 12700K bios temps are about 8 to 10 degrees higher than the idle temps reported by lm-sensors.

Next, just to eliminate any potential kernel regression with regard to your hardware, you might want to get a livecd with a more current kernel and attempt to run your stress tests after booting with it and see if you get the same result. It's very unlikely to be an issue but I would want to eliminate this remote possibility before going further.
Those are the only things I have to add to what teckk posted above.
Even if lm-sensors is not accurate, the readings (although possibly wrong) should not CHANGE by such a large amount.

As far as a more current kernel, I'm running Debian 11.3.0 which is the latest one out. I don't think the kernel could be newer unless I compiled it myself...?
 
Old 06-14-2022, 07:57 AM   #10
Krupski
Member
 
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Arnulf View Post
Consider about thermal capacity of heat sink. Capacities gives time delays. Therefore it might be possible that temperature measured inside the die increase by almost 70°C in less than a second.

If an OEM heatsink is built in your PC this is very bad. These OEM heatsinks gives poor cooling performance. They only have to ensure that CPU survives warranty period.
I built my own machine. There is no "warranty period".

Besides, I don't believe that ANY heatsink, no matter how well it is thermally bonded to the CPU package, could have any effect on a temperature that jumps by 70°C in less than a second. The heat simply could not flow that rapidly.

A guy I know thought of a possibility that actually does make sense to me. He said maybe the on-chip temperature sensors being part of the CPU silicon, respond to heat generated in the die because the sensor and core are literally only a few hundred nanometers apart, and show a change long before the heat can flow to the IHS, let alone the heatsink. Maybe?
 
Old 06-14-2022, 09:12 AM   #11
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,424

Rep: Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591Reputation: 2591
If it's a Zen 3 AMD CPU, it might be ok. They are using semiconductor junctions on the die itself as heat sensors. response is nearly instant. It's a known technique. The junction is non linear, but voltage falls as temperature rises. Voltage --> temperature conversion can be by lookup table, or equation.

I don't know how intel are doing, but I presume they're not far behind. If a temperature gradient is rising nearly vertically, it makes good sense to exaggerate this in software to call in thermal limiting before it's needed. Semiconductors can't take overheating like metal or plastic. Your cpu would die dead well within 100µS of overheating. Heatsinking has only minimal effect with reactions of that speed.

So if the temprature gradient was steep, and the figure was enhanced to trigger protection in time, you could well see the crazy figures we have trouble believing.
 
Old 06-14-2022, 10:08 AM   #12
Krupski
Member
 
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by business_kid View Post
If it's a Zen 3 AMD CPU, it might be ok. They are using semiconductor junctions on the die itself as heat sensors. response is nearly instant. It's a known technique. The junction is non linear, but voltage falls as temperature rises. Voltage --> temperature conversion can be by lookup table, or equation.

I don't know how intel are doing, but I presume they're not far behind. If a temperature gradient is rising nearly vertically, it makes good sense to exaggerate this in software to call in thermal limiting before it's needed. Semiconductors can't take overheating like metal or plastic. Your cpu would die dead well within 100µS of overheating. Heatsinking has only minimal effect with reactions of that speed.

So if the temprature gradient was steep, and the figure was enhanced to trigger protection in time, you could well see the crazy figures we have trouble believing.

I stated in the original post that the CPU is an Intel i911900K.
 
Old 06-14-2022, 11:28 AM   #13
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 3,016

Rep: Reputation: 399Reputation: 399Reputation: 399Reputation: 399
Quote:
As far as a more current kernel, I'm running Debian 11.3.0 which is the latest one out. I don't think the kernel could be newer unless I compiled it myself...?
Debian 11 stable was released in August 2021 with kernel 5.10 which was released in December 2020. Debian generally will only backport security patches to its kernels in the stable branch. Your processor was released by Intel in Q1 of 2021. Basically, you are using a kernel which may predate the issuance of your cpu and may not contain all the necessary patches to give you optimal performance with your cpu. That's the problem with using newer hardware with Debian stable. I'm not suggesting that you compile your own current kernel. But you could get a livecd with just about any other current distro not based on Debian stable and have a much more current kernel then you presently have. You could then boot into your livecd and run your tests and see if you get the same results.

By way of example, on my new build which has an Intel core i7 12700K cpu I have several distros installed including Debian 11 and Arch. For benchmark purposes, I wrote a little program in C which calculates all the primes between zero and two billion using the sieve of Eratosthenes. On Debian 11 with kernel 5.10.0-15 the program takes 73 seconds to run. On Arch with kernel 5.18.2 the same program takes 46 seconds to run. This just illustrates that using an older kernel on newer hardware can have a significant performance impact and there may be other issues as well.
 
Old 06-14-2022, 12:27 PM   #14
EdGr
Senior Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 1,058

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Quote:
Originally Posted by Krupski View Post
A guy I know thought of a possibility that actually does make sense to me. He said maybe the on-chip temperature sensors being part of the CPU silicon, respond to heat generated in the die because the sensor and core are literally only a few hundred nanometers apart, and show a change long before the heat can flow to the IHS, let alone the heatsink. Maybe?
Thermal networks are modeled similarly to electrical resistor-capacitor networks.

Think of a thermal network as a network of series-connected resistors, with each node having a capacitor to ground. A thermal network is characterized by thermal resistance and thermal time constants, which are analogous to electrical resistance and electrical RC time constants.

The temperature can change quite rapidly at the CPU die, but only over a limited range. Eventually, the heat propagates to the slower heat spreader and even slower fansink.

On one of my machines, the CPU die temperature can change by 30C instantly. The next 30C delta takes minutes.
Ed
 
Old 06-23-2022, 07:47 PM   #15
Krupski
Member
 
Registered: Jan 2015
Location: The worst state in USA (can you guess?)
Distribution: Debian 11 (x86_64)
Posts: 44

Original Poster
Rep: Reputation: 0
Well everyone, I found the problem. The die thermal sensors are a part of each core, so when a core gets busy (i.e. draws more power), the temperature as seen by the thermal sensor diode rises almost instantly. When the load goes down, the extra heat flows into the IHS and then out of the heatsink. So, this effect is inherent in the processor design and not a real "overheat". A better way to measure temperature would be to install a temperature sensor either on the IHS (via a machined groove to accomodate the sensor and wire), or installed on the heatsink, away from cooling fan airflow. But, not to worry... a simple touch of the heatsink with a finger is all you need... too hot to touch, too hot to use. Watch out for those fan blades. Blood does not make a good coolant!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: The Rise And Fall of Languages in 2013 LXer Syndicated Linux News 0 01-08-2014 05:31 PM
LXer: The rise of Drupal and the fall of closed source LXer Syndicated Linux News 0 01-02-2014 08:20 PM
LXer: The Rise and Fall of Languages in 2012 LXer Syndicated Linux News 0 01-09-2013 12:31 AM
LXer: The Rise and Fall of Programming Languages in 2011 LXer Syndicated Linux News 0 01-11-2012 07:20 AM
LXer: The Rise and Fall of MySQL LXer Syndicated Linux News 0 03-23-2010 07:50 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 01:47 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration