SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Recently, I've been getting kernel panics from my machine, and I'm at a loss for how to fix them.
I'm running slackware64 14.1 on an AMD Quadcore 9600.
Panics most often occur during bootup, after lilo loads the kernel and before I get to login. My machine also will freeze now and again when starting X, failing to send any signal to my monitor or respond to keyboard commands. Most recently, I had a panic while I was in X, debugging some javascript.
Here's a screenshot of the message that appeared on my most recent kernel panic: http://i.imgur.com/BQIXon9.png
(I apologize for the blur; my camera is not excellent.)
I have installed the multilib packages; is it possible that these have introduced this error?
To check that it's not your Slackware install, perhaps boot up off a live cd (eg ubuntu) and see how that goes. If that does the same, then clearly it would be a hardware issue. If not, well then do come back to us and we can continue with suggestions.
Your system trigger a MCE (Machine Check Exception), which is likely a problem with the hardware.
Clean up the cooling system (just in case this is overheating), use Memtest86+ (available from the bootscreen of your Slackware DVD) to check the RAM.
Mark, I have used LiveCD's on this computer with no issue.
TobiSGD, I have not yet run a memtest, but I did just now open up my computer and dust off the cooling system. I discovered that my rear fan (not on the CPU or PSU fan) has lost a blade. At the very least, this explains some of the noise with my machine.
Since then, I booted up the machine and had a crash twice on booting X. I'll post again as soon as I have a chance to memtest.
So live CDs run fine, eh? For starters, can you put together a pastebin with the output from:
dmidecode
lsmod
lspci -v
Also, what machine is this?
--mancha
-----
Edit:
You can also try running mcelog to get some more verbosity. Not sure why Slackware doesn't have a hook
for this but you can add the following code block to /etc/rc.d/rc.local
Code:
# Start mcelog daemon
if [ -x /etc/rc.d/rc.mcelog ]; then
/etc/rc.d/rc.mcelog start
fi
You can place your mcelog settings in /etc/mcelog.conf
Run: http://www.mersenne.org/download/index.php#source
In mode 1 to try and see if the CPU is working properly. The error clearly states that there is an MCE on the CPU meaning that the CPU may be faulty. Let it run for 13 runs and see if it prints an error.
I've replaced the broken CPU fan, but I'm still getting crashes during X startup. Any ideas?
Specifics would far and away lead to some ideas. I know your original post mentions kernel panic, is that still the issue? Going on what you've mentioned so far, my gut reaction is a faulty cpu core, but that is really just a guess. If it were my machine, I might look into disabling the 3rd core (core 2 being the one to show the fault in the pic you linked to), but I don't know your bios and whether core disabling is even viable with your machine's bios.
Either of these next two suggestions would allow you to temporarily disable suspected cores, to test whether running without them improves matters. These might be better to try first, rather than messing with the bios, because these are fairly simple and can be easily discarded if proved useless:
You could disable an individual core via sysfs:
Code:
echo "0" > /sys/bus/cpu/devices/cpu2/online
You could boot with only the first two cores by adding 'maxcpus=2' to the kernel command line.
Bare in mind that I'm am going on a hunch here. My suggestions may only be a blind alley and no help at all.
Regards
EDIT:
@metaschima: You beat me to the punch. Good idea on the prime95 test.
1. Memtest86+ will see if your RAM may have problems. This can be anything from modules going bad to total failures.
2. When you format a disk, try using the SLOW format to check for bad blocks. If your hard drive has a lot of errors you may need to replace it. A slow format will tell you if there are bad sectors. On large capacity disks this will take considerable time, but it's worth it.
3. Check your cables for breaks, clean the air flow paths, and look for discoloration and burn marks on hardware. Any of these could mean it's time to start replacing hardware.
Hello
I'm sorry to say that if that fan has been broken for sufficient time, hardware damage may have occurred. OTOH just as often the thermal grease may simply have "caked up" from overheating and need to be replaced. It would probably be wise to use some monitoring software like Conky to keep a close watch. Of course you could just run lmsensors in a terminal but IMHO constant desktop meters are extremely valuable. Also, you might check in bios to see if your fans have been set to some "quiet mode" that gives silence preference over temperature. Heat is the enemy of electronics.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.