SlackwareThis Forum is for the discussion of Slackware Linux.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Recently, I've been getting kernel panics from my machine, and I'm at a loss for how to fix them.
I'm running slackware64 14.1 on an AMD Quadcore 9600.
Here's a screenshot of the message that appeared on my most recent kernel panic: http://i.imgur.com/BQIXon9.png
(I apologize for the blur; my camera is not excellent.)
I have installed the multilib packages; is it possible that these have introduced this error?
To check that it's not your Slackware install, perhaps boot up off a live cd (eg ubuntu) and see how that goes. If that does the same, then clearly it would be a hardware issue. If not, well then do come back to us and we can continue with suggestions.
Your system trigger a MCE (Machine Check Exception), which is likely a problem with the hardware.
Clean up the cooling system (just in case this is overheating), use Memtest86+ (available from the bootscreen of your Slackware DVD) to check the RAM.
Mark, I have used LiveCD's on this computer with no issue.
TobiSGD, I have not yet run a memtest, but I did just now open up my computer and dust off the cooling system. I discovered that my rear fan (not on the CPU or PSU fan) has lost a blade. At the very least, this explains some of the noise with my machine.
Since then, I booted up the machine and had a crash twice on booting X. I'll post again as soon as I have a chance to memtest.
In mode 1 to try and see if the CPU is working properly. The error clearly states that there is an MCE on the CPU meaning that the CPU may be faulty. Let it run for 13 runs and see if it prints an error.
I've replaced the broken CPU fan, but I'm still getting crashes during X startup. Any ideas?
Specifics would far and away lead to some ideas. I know your original post mentions kernel panic, is that still the issue? Going on what you've mentioned so far, my gut reaction is a faulty cpu core, but that is really just a guess. If it were my machine, I might look into disabling the 3rd core (core 2 being the one to show the fault in the pic you linked to), but I don't know your bios and whether core disabling is even viable with your machine's bios.
Either of these next two suggestions would allow you to temporarily disable suspected cores, to test whether running without them improves matters. These might be better to try first, rather than messing with the bios, because these are fairly simple and can be easily discarded if proved useless:
You could disable an individual core via sysfs:
echo "0" > /sys/bus/cpu/devices/cpu2/online
You could boot with only the first two cores by adding 'maxcpus=2' to the kernel command line.
Bare in mind that I'm am going on a hunch here. My suggestions may only be a blind alley and no help at all.
@metaschima: You beat me to the punch. Good idea on the prime95 test.
1. Memtest86+ will see if your RAM may have problems. This can be anything from modules going bad to total failures.
2. When you format a disk, try using the SLOW format to check for bad blocks. If your hard drive has a lot of errors you may need to replace it. A slow format will tell you if there are bad sectors. On large capacity disks this will take considerable time, but it's worth it.
3. Check your cables for breaks, clean the air flow paths, and look for discoloration and burn marks on hardware. Any of these could mean it's time to start replacing hardware.
Distribution: Slackware 14 is Main OpSys on Main PC, 2ndary are OpenSuSe 13 and SolydK
I'm sorry to say that if that fan has been broken for sufficient time, hardware damage may have occurred. OTOH just as often the thermal grease may simply have "caked up" from overheating and need to be replaced. It would probably be wise to use some monitoring software like Conky to keep a close watch. Of course you could just run lmsensors in a terminal but IMHO constant desktop meters are extremely valuable. Also, you might check in bios to see if your fans have been set to some "quiet mode" that gives silence preference over temperature. Heat is the enemy of electronics.