Kernel Panic on Slackware64 14.1
Hi all,
Recently, I've been getting kernel panics from my machine, and I'm at a loss for how to fix them. I'm running slackware64 14.1 on an AMD Quadcore 9600. Panics most often occur during bootup, after lilo loads the kernel and before I get to login. My machine also will freeze now and again when starting X, failing to send any signal to my monitor or respond to keyboard commands. Most recently, I had a panic while I was in X, debugging some javascript. Here's a screenshot of the message that appeared on my most recent kernel panic: http://i.imgur.com/BQIXon9.png (I apologize for the blur; my camera is not excellent.) I have installed the multilib packages; is it possible that these have introduced this error? Thanks in advance for any advice or suggestions. |
To check that it's not your Slackware install, perhaps boot up off a live cd (eg ubuntu) and see how that goes. If that does the same, then clearly it would be a hardware issue. If not, well then do come back to us and we can continue with suggestions.
|
Your system trigger a MCE (Machine Check Exception), which is likely a problem with the hardware.
Clean up the cooling system (just in case this is overheating), use Memtest86+ (available from the bootscreen of your Slackware DVD) to check the RAM. |
Mark, I have used LiveCD's on this computer with no issue.
TobiSGD, I have not yet run a memtest, but I did just now open up my computer and dust off the cooling system. I discovered that my rear fan (not on the CPU or PSU fan) has lost a blade. At the very least, this explains some of the noise with my machine. Since then, I booted up the machine and had a crash twice on booting X. I'll post again as soon as I have a chance to memtest. Thanks to you both! |
I ran memtest86+ and found no errors.
I've had no trouble booting from the Slackware liveUSB, nor from the SLAX liveCD. The replacement fan is in the mail. Until then, are there any other diagnostics I can run? |
So live CDs run fine, eh? For starters, can you put together a pastebin with the output from:
--mancha ----- Edit: You can also try running mcelog to get some more verbosity. Not sure why Slackware doesn't have a hook for this but you can add the following code block to /etc/rc.d/rc.local Code:
# Start mcelog daemon |
|
Hi again,
I've replaced the broken fan, but I'm still getting crashes during X startup. Any ideas? |
Run:
http://www.mersenne.org/download/index.php#source In mode 1 to try and see if the CPU is working properly. The error clearly states that there is an MCE on the CPU meaning that the CPU may be faulty. Let it run for 13 runs and see if it prints an error. |
Quote:
Either of these next two suggestions would allow you to temporarily disable suspected cores, to test whether running without them improves matters. These might be better to try first, rather than messing with the bios, because these are fairly simple and can be easily discarded if proved useless:
Bare in mind that I'm am going on a hunch here. My suggestions may only be a blind alley and no help at all. Regards EDIT: @metaschima: You beat me to the punch. Good idea on the prime95 test. |
It could be several hardware failures.
1. Memtest86+ will see if your RAM may have problems. This can be anything from modules going bad to total failures. 2. When you format a disk, try using the SLOW format to check for bad blocks. If your hard drive has a lot of errors you may need to replace it. A slow format will tell you if there are bad sectors. On large capacity disks this will take considerable time, but it's worth it. 3. Check your cables for breaks, clean the air flow paths, and look for discoloration and burn marks on hardware. Any of these could mean it's time to start replacing hardware. |
metaschima, I ran mprimes in single user mode, as you suggested, and it got through six tests before, surprise! Kernel panic.
Here's the output More interestingly, dmesg threw a couple of these at me: Quote:
j_v, I will try your idea next, and report back. ReaperX7, I have already (1) run memtest86+, (2) checked my hard drive for bad blocks, and (3) cleaned and inspected my computer's internals. Thanks to all of you for your input! |
Hello
I'm sorry to say that if that fan has been broken for sufficient time, hardware damage may have occurred. OTOH just as often the thermal grease may simply have "caked up" from overheating and need to be replaced. It would probably be wise to use some monitoring software like Conky to keep a close watch. Of course you could just run lmsensors in a terminal but IMHO constant desktop meters are extremely valuable. Also, you might check in bios to see if your fans have been set to some "quiet mode" that gives silence preference over temperature. Heat is the enemy of electronics. |
Check the CPU temperatures and make sure they are under critical. If they are under, then it is very likely that the CPU is faulty.
|
After passing the kernel 'maxcpus=2' at boot, mprimes appears to run without error, and I have had no kernel panics.
Thanks to you all for helping me pinpoint the problem. Now to figure out a replacement CPU . . . |
All times are GMT -5. The time now is 04:42 PM. |