Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I am having some strange issues with my debian server box. I've been running Ubuntu (server) on this PC for a long time and without any problems. Recently I've decided to switch to Debian to try out the difference. Some time after the installation of the new linux distro a strange problem began to trouble my server box. Every once in a while the server would become unresponsive to my ssh login requests, and turning on the server monitor would show a nice little kernel panic message:
CPU0: Machine Check Exception: 0000000000000004
CPU0: Bank 0: 3200008000000800
CPU0: Bank 3: 3200000000080a01
Kernel panic - not syncing: CPU context corrupt
It was pretty much the same kernel panic each time. I've searched the internet for this type of message and I came over a number of websites claiming that this problem is due to CPU overheating. I leaned back in my chair for a moment, and said hmmmm... The CPU has never before overheated, so what's the chance of it doing that right now? I decided to check anyway. Upon removing the box cover it turned out that the CPU case was indeed extremely hot to the touch. So the kernel panic was indeed caused by overheating, but what could have caused it? My first reaction was that the CPU was clogged with dust and needed cleaning, but that was not the case as it has recently been cleaned. Another option was that the CPU fan may have died, but it wasn't the case either since it was spinning nicely each time I powered on the comp.
Normally when I power on the box, it will not overheat immediately. The CPU case will remain cold to the touch for a long while, sometimes even up to a few days! But at some random point it will begin to heat up. Personally I can't see the temperature with my eyes. I normally open the case several times and feel the CPU case with my hands. It turns out to be cold most of the time and the CPU fan is spinning nicely. I do however notice when the caps-lock light starts flashing on the keyboard, which suggests that the kernel panic has taken place due to overheating.
The server box is an old Pentium 300 MHz with 256MB of ram. It only runs a HTTP server with some other services such as mysql, samba, cups, webmin, and a ssh for remote logins. I suspected that the overheating might be caused by a rogue process taking up 100% of the CPU all the time. So I left a process monitor running on the main terminal, listing currently active processes and their CPU usage. When the panic took place, it froze the screen, leaving the current process list available for me to review. The CPU usage turned out to be almost zero, having the "top" process the highest on the list with 1.3% of CPU usage.
Now here's my dilemma. I have no idea what causes this strange overheating. It has never happened before, it started happening a short time after I installed debian, it doesn't seem to be caused by a rogue process, and the strangest part - it seems to happen in random intervals. Any ideas or suggestions on how to further diagnose the problem?
I have a twin core AMD on an Asus MB. Absolutely trouble free since installing this in September. I am running Debian Unstable, upgraded to KDE4 a couple of weeks ago.
My 500W EZ Cool PSU went snap-crackle-pop on Friday morning and I replaced it with a 700W Storm unit.
A short while ago I was playing Oolite when suddenly the machine rebooted. I got a hot CPU warning and, sure enough the CPU temp as indicated in the BIOS Hardware Monitor was 95 deg C. The heat sink felt cool.
I let the machine cool down for about 15 min and restarted it. Temperature was 77 deg and climbing at about 1 deg every 3 seconds.
I repeated this and observed the same behaviour.
So I tried changing the clock multiplier from 14 to 8. The PC would not boot at all.
I reset the CMOS and the machine came up Ok. The temperature is now dropping.
I don't know if the overheating is "real" but it is seems to be happening outside of Debian. Could some software be zapping the CMOS settings?
You know, just cuz your heatsink is cool doesn't mean your cpu is too. I'd check your thermal paste, with older computers, that stuff can get hard and not pass heat well. In addition, your CPU isn't the only thing that generates heat. Your HDD's is a big one, and one that a lot of people overlook is the RAM, RAM is somewhat sensitive to heat, and it can get fairly hot. I'd make sure your chassis cooling is in order too.
It occurs when there is heavy disc activity in the Virtual Machine.
What I did was open Task Manager in Windows, System Monitor and KSensors in Linux. The CPU load in Linux tracked that in Windows but with a significant multiplier. At the same time the temperatures rose with CPU activity and rose most in the more active core.
Perfectly obvious I suppose once one twigs what's happening.
First I watched AVG Free scan and update. The temperature in one core reached 89 deg C. After the system cooled down I copied a large directory but had to abort when both cores reached 85 deg C with a long way yet to go.
The system cools off very quickly once the CPU load drops.
1. The stock AMD cooler cannot cope with a sustained high CPU load.
2. The CPU has to work extremely hard to cope with a high CPU utilisation within Vbox.
One would hope that VBox will improve matters but similar problems seem to have been around for more than 2 years. However I still think that a recent Debian upgrade - don't know which - has exercabated the situation.
I spent some time last night reading CPU Cooler tests. I'm off the Scan to buy an Akasa 967 cooler this evening.
jim80net, thanks for your comment. However my PC is well cooled by normal standards. Large case, 2 chassis fans, PSU with 120mm fan, round IDE cables not to impede air flow and Artic Silver paste on the CPU. it has to run all day in a 35 degree plus ambient in Summer.
As much as I like seeing other people having their problems fixed, it does not solve my original dilemma. Having tested the CPU usage the second time I am now fairly certain that a rogue process isn't causing the overheating. I will now try to catch the CPU redhanded. That means before the kernel panic shows up, giving me time to do some analysis.
From solving my problem I think that the cause is a Debian package that has been recently updated and developed this problem. In my case it was disc activity in VirtualBox causing 100% CPU utilisation.
Can you establish whether, in your case you are getting a similar event chain Disc Activity ----> CPU Utilisation ----> Overheating?
That would be a start. If it turns out to be the case then it could be that some normal, disc intensive, process such as an indexing run is triggering the overheating.
Please note that this system is a very basic debian installation. It does not use virtual box, and it doesn't even have X11 installed. The only way to interact with it is via command line. I normally ssh onto the box from another machine. Suppose there isn't a rogue process that causes 100% CPU use, what other factors could cause the CPU to overheat? I can't think of another. Could it be some rogue kernel module that isn't showing in the "top" process monitor?
I suggest "stressing" the PC by copying a fairly large chunk of data. Watch the temperature. if it heats up then I would blame a recent update to Debian (or maybe Linux). In your case, it being a recent installation, you would have installed this as part of the distro and would be unlikely to have an update history you can consult.
No matter what, you will be able to either pin the problem on disc activity or eliminate it entirely.
Your method of detecting which process, if any, has a high CPU usage may not be foolproof. It is possible that the CPU intensive process ended just as the CPU reached the critical temperature. Far fetched? Maybe but stranger things happen regularly :-)
Something else: Could the Power Supply be causing the problem? Have you checked whether the air intakes are dust free and the fan is rotating at full speed? When the PC gets hot is the PSU hot too? If it is then I would suspect it.
Have you checked the voltages? Does your setup screen have a hardware monitor? You could install LMsensors. Setup is a bit of a pain but, in my experience, well worth it. You could even set up a cron job to run lmsensors every few minutes. If the PC hangs the last run may give you useful information. Better still, set up a job which runs
sensors > sensors.txt
top -b -n 1>top.txt
or even, if you have the disc space
sensors >> sensors.txt
top -b -n 1 >> top.txt
That should give you a record of what happened and you correlate the tasks with temperatures, voltages, fan speeds....
Back to the hardware side, you could try swapping out the CPU cooler and the PSU. Incidentally Arctic Silver on the heat sink can drop the temperature by 2 or 3 degrees C.
Rkhunter was probably stressing the PC while looking for problems.
Since this thread started I have upgraded to an aftermarket CPU cooler and the difference this has made is tremendous. I think you may find that the problem is only solved until you next get a hyperactive prohram.