FC6 Won't Boot: "Critical Temperature Reached (128 C), shutting down"
Linux - Laptop and NetbookHaving a problem installing or configuring Linux on your laptop? Need help running Linux on your netbook? This forum is for you. This forum is for any topics relating to Linux and either traditional laptops or netbooks (such as the Asus EEE PC, Everex CloudBook or MSI Wind).
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
FC6 Won't Boot: "Critical Temperature Reached (128 C), shutting down"
This is in regards to my Compaq V2000 notebook PC (AMD Turion64 3800+, 1GB RAM) running Fedora Core 6 (x86_64). Shortly after the kernel is loaded--and before the X server starts for graphical startup--I will see a kernel message that says:
Code:
Critical temperature reached (128 C), shutting down.
Naturally, it follows through and does indeed shut down. The problem is that the CPU is nowhere near that temperature--in fact, I've booted into Windows XP and run some temp/fan monitoring software to check it.
In fact, this is a "sometimes but not always" problem. I have used this laptop on a daily basis for months now and noticed that this has gone from not happening at all; to happening sometimes and being annoying (i.e. if I'm persistent and try to start it 5 times, it finally works); to now it won't boot at all (i.e. 15+ reboots and it still won't go).
Another interesting trait: once the machine gets past init and into GDM, I will never have any problems with it until I shut it off. I can use it for 3+ hours until the battery dies and never have any such problems. So it's just a small window of time (before init?) when this occurs.
Last tidbit: This only happens in Fedora. This laptop triple boots Windows XP (i386), Fedora Core 6 (x86_64), and Ubuntu Edgy (x86_64)--and none of these other installs have had any problems with this whatsoever.
I have been able to locate a kernel bug (Bug 3584) which shows that other people are having the problem, too, and that kernel devs are aware of the problem. However, I'm wondering why this only happens in FC6.
Does anybody know of any workaround to this? Am I going to have to recompile my kernel and, if so, how can I go about doing that on a non-Fedora box (since all of my desktops are Ubuntu Edgy)?
Does the FC6 machine use lm_sensors or a similar tool for monitoring hardware sensors?
I don't know much about FC6 except what I read around here, and that seems to be a load of bugs however I am wondering if either the sensors monitoring program OR the kernel, is mis-configured and is either monitoring the wrong hardware sensor, or the calculations in the sensors.conf file are out of whack, causing weird results???
If by some stroke, this happens to be along the lines of the problem, you could adjust or comment out the offending calculation or sensor in the config file as a temporary workaround.
In reality, I think it's safe to say that if a CPU reached 120'C it would be dead..
I'd suggest that you open up your tower and do a through vacuuming job, including removing the heat sink and fan assembly off of your processor and blowing some compressed air through the fins on the heatsink as they could be really dusty. You will need a tube ove heat transfer jell which is avaliablre at your local radioshaq or other electronics parts store that is also where you will find the compressed air.
Funny you should mention that, GrapeFruiTgirl: In fact, after skimming through the kernel bug report, that's almost exactly what is goning on. Although I don't fully understand the issue (and certainly can't help to resolve it at the kernel level), lm_sensors and ACPI are conflicting on my system and causing errorneous results. One of the kernel devs doesn't seem to have good things to say about ACPI, but it also seems to be a "necessary evil" that cannot be eliminated as long as hardware is still using it.
As an interim solution, I've added the parameter "acpi=off" the GRUB kernel line to disable ACPI. The machine boots, but I got a feeling this isn't going to be good for battery power. Not only that, but I can't even monitor the charge any more. So I'm still looking for a better solution; but at least I can boot.
flanksteak: Thanks for the suggestion, but this is a laptop and the cooling hardware definitely is working fine. The other 2 OS installations on this machine don't have any problems at all.
@ Sancho: I have done some work on my own lm_sensors setup, so I am comfortable with that, but as for ACPI, while I know 'what' it does, I don't know enough about it to make any recomendations about how to use it AND circumvent this problem. I can say I have read many threads and different websites and bug reports I have come across about many devices which don't work right alongside ACPI.
Nice move turning ACPI=off. I wonder, do you think that machine could use the deprecated APM system instead of ACPI, to do the same job, atleast as far as allowing you to monitor the battery. I use lm_sensors on my machine, and compiled in the necessary sensor device into my kernel, but I really don't know if it is infact ACPI that allows me to read the data. Perhaps I will disable ACPI either in LILO or in my BIOS, and see if my sensors still are readable.
Perhaps this is a solution -- you will need to recompile the kernel if so. Can you do that, or is it beyond your knowledge? It's not too big a deal generally. I don't know MEPIS, but I can help with the kernel if it comes to it.
I gotta get to bed. Will check in tomorrow and see what's what.
I wonder, do you think that machine could use the deprecated APM system instead of ACPI, to do the same job, atleast as far as allowing you to monitor the battery.
I'm not sure, but given that it's less than 2 years old, I would hope it's not using anything "depreciated"! What I do know, however, is that when I pass the parameter "acpi=off" to the kernel boot line, I can find no evidence that Fedora "thinks" it's running on a laptop anymore. For example, the battery charge reports 0% even when running on batteries (as per the GNOME battery applet) and the GNOME Power Management Preferences no longer has a "Running on Battery" tab (only a "Running on AC" tab).
As for recompiling the kernel: I'm familiar with this process when using the "stock" kernel source tarball (i.e. from kernel.org). However, I've found that this gets me into nothing but trouble when using such a kernel in a modern distro. Usually I run into all sorts of problems at init with modules not being able to load and such. Also, when it comes to using third-party kernel modules (i.e. such as for display drivers from livna.org or ndiswrapper), these modules expect a certain "stock" Fedora RPM version number such as "2.6.20-1.1234"; however, when I recompile the kernel, I'll end up with some number different than that, causing any such modules to be incompatible.
In other words... I could recompile the kernel, but I'm afraid to.
Assuming I could circumvent the problems listed above, what are you suggesting that I do?
By chance have you checked for BIOS updates? If you perform a few billion of them you will notice that "Power Management" is one of the top reasons for them if you read the change logs. On another note, have you tried resetting the BIOS back to it's defaults? I have seen many extended battles with machines caused by one jacked up setting in the BIOS.
As far as the other two OSs not having a problem, that is a moot point. Just because an issue doesn't seem to affect them doesn't mean that you don't have it. It is good diagnostic information, but it isn't absolutely conclusive.
And yes, wrong settings in the BIOS can cause unpredictable, and intermittent glitches. A logical person would think that it would not, but it happens.
On the other hand, you might just have a wrong setting in a config file. Or maybe it IS just a Fedora bug, that wouldn't shock me...
James
Last edited by james_jenkins; 03-25-2007 at 11:41 PM.
LOL, well, re: the OP's last line of his last thread, *IF* you were to recompile the kernel (and it may not be worth it, I don't know) I would try using the APM power management routines in the kernel, rather than the ACPI system. It's generally an either/or situation. I have used APM on my machine when I first started into Linux, and it worked just as well as ACPI, with the advantage that there were more places I could use power-off and standby modes, like for example: Using APM, I could use the Monitor standby/shutdown when configured from my screensaver. Now I am using ACPI, and those settings don't shut the monitor off, it only blanks it after a while; I have to use DPMS in xorg.conf to manage the shutdown of the monitor.
Besides something like that, I believe the 2 systems are working to the same end (and of course I stand to be corrected by someone more knowledgeable).
The other obvious difference, while not really a 'functional' one, is that if I compile APM into the kernel, I get a warning that I am using a 'deprecated' power management function., Now, IMO, it may be 'deprecated' by someone's standard, but if that's what my particular hardware responds to and likes, then it isn't too deprecated after all.
The only reason I switched to ACPI was because I leanred that for me, it does the same thing, with the exception of the monitor shut-off, and because I don't like the deprecated message from the compiler .
And to comment to James above: LOL, I agree, while I know very little about Fedora, I am sure glad I use Slackware! I've seen more really-screwed-up-issues from users of Fedora X than any other one OS, around here.
I used RedHat up until the RedHat 9/Fedora WTF? incident. I STILL have one RH 9 server running just because it took forever to get just perfect, and it has not needed to even be LOOKED AT in YEARS. I have gone months without realizing that it was still in there running. But, when I walked, I didn't look back. One of these days it is likely to quit working, and that will be my last sad RH day.
Now, in RedHat's defense, they don't have a monopoly on the "Upgraded AND Broken"_TM department. Lately I have been noticing some things coming full circle. First we didn't have items, then we did, but not everything worked right, or at all. Next things started working much better, then they seemed to be fixed. Then there was a period of time that things were just peachy. Then a new version comes out, IE: openSuse 10.2, and all of a sudden hardware that has worked flawlessly for years, FLAT-WILL-NOT-WORK. Most noticable to me was laptop wireless cards. OTOH, cards that had never worked, or worked well, were recognized and online in about 3 seconds, out of the box. IE:Linksys 54g pcmcia card.
That is the reason I went to SLED 10 on my main ThinkPads. LONG support cycles, no quicky updates just because something new has just been announced. No six month reloads. EVERYTHING just works the way it is SUPPOSE to. I just don't have time or patience for some of the STUPID problems I have seen being released. I just mentioned network issues in openSuse 10.2. Anyone remember the cluster that "Updating" was in Suse 10.1? As far as "Power Management" goes, I would almost be scared to close the lid of my laptops after a new version came out because I didn't know what was going to happen. There were some issues a while back that would "KILL" your ThinkPad if you were one of the misfortunate ones.
Anyway, enough ranting. Personally I would like to see a feature freeze on everything and spend the next 12 months fixing bugs. I think that would do more for Linux than most anything else.
Oh, and I don't like recompiling kernels either. There, now I feel better.
Based on what I'm hearing here, I think that ultimately the issue boils down to something specific to the way that the Fedora packagers choose to configure their stock kernels. It wouldn't be the first time that the Fedora folks chose to go "against the grain" and use nonstandard settings--and it would also explain why neither Ubuntu or Windows XP exhibit the same behavior.
That being said, I don't think there's going to be any way around this without recompiling the kernel. I know it's not that difficult to do, but I've never recompiled a kernel into a "distro-native" package without having at least some new annoyances. Therefore, I've decided to just switch over to Ubuntu Feisty on my laptop. I run Ubuntu on my desktops and server anyways, so it's a natural change for me.
Anyways, I know that's not much help to other people who may be having this problem, so feel free to continue the thread. Thanks again, GrapefruiTgirl and james_jenkins for your responses.
I've been having this issue on and off with various kernel updates and the latest (2.6.20-1.2944.fc6) fixed it for me. I've been booting with acpi=off but now I removed it and I get my battery times back and everything. I did "yum remove lm_sensors" and probably won't add it back.
The laptop was off for the whole night so I think 5155degC is a bit excessive from Linux Probably 51 or 55 degrees.
The only bad thing was a hard reset yesterday due to a kernel lockup coming from the video driver. I've fscked all my disks and haven't seen anything abnormal...
I'm using 2.6.20.3 on Debian Unstable. I will try to disable this automatic shutdown but I need ACPI on if I want to investigate.
Currently writing on LQ with acpi=off...
If anybody has a link to a bug report from the kernel mailing list or any idea while I'm looking on my side, it would be much appreciated.
Regards
edit: In my case where I am sure that the warning is wrong, moving
/sbin/poweroff to /sbin/poweroff- doesn't automatically reboot anymore.
Code:
mv /sbin/poweroff /sbin/poweroff-
/!\ DON'T DO THIS IF YOU ARE SURE TEMPERATURE ARE ABOVE LIMITS!!
edit2:
after running 20 minutes without this alarm and checking physically the temperature was not increasing and the fans were working, the temperature went down from 5000 degres to 49 degres. Since then I have put back my /sbin/poweroff and rebooted, everything is find. Kinda strange.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.