LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 04-15-2014, 01:44 PM   #1
Eupator
Member
 
Registered: May 2008
Distribution: Slackware/Slackintosh
Posts: 39

Rep: Reputation: 0
Kernel Panic on Slackware64 14.1


Hi all,

Recently, I've been getting kernel panics from my machine, and I'm at a loss for how to fix them.

I'm running slackware64 14.1 on an AMD Quadcore 9600.

Panics most often occur during bootup, after lilo loads the kernel and before I get to login. My machine also will freeze now and again when starting X, failing to send any signal to my monitor or respond to keyboard commands. Most recently, I had a panic while I was in X, debugging some javascript.

Here's a screenshot of the message that appeared on my most recent kernel panic:
http://i.imgur.com/BQIXon9.png
(I apologize for the blur; my camera is not excellent.)

I have installed the multilib packages; is it possible that these have introduced this error?

Thanks in advance for any advice or suggestions.
 
Old 04-15-2014, 01:49 PM   #2
Mark Pettit
Member
 
Registered: Dec 2008
Location: Cape Town, South Africa
Distribution: Slackware 15.0
Posts: 619

Rep: Reputation: 299Reputation: 299Reputation: 299
To check that it's not your Slackware install, perhaps boot up off a live cd (eg ubuntu) and see how that goes. If that does the same, then clearly it would be a hardware issue. If not, well then do come back to us and we can continue with suggestions.
 
Old 04-15-2014, 02:13 PM   #3
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
Your system trigger a MCE (Machine Check Exception), which is likely a problem with the hardware.
Clean up the cooling system (just in case this is overheating), use Memtest86+ (available from the bootscreen of your Slackware DVD) to check the RAM.
 
Old 04-15-2014, 03:27 PM   #4
Eupator
Member
 
Registered: May 2008
Distribution: Slackware/Slackintosh
Posts: 39

Original Poster
Rep: Reputation: 0
Mark, I have used LiveCD's on this computer with no issue.

TobiSGD, I have not yet run a memtest, but I did just now open up my computer and dust off the cooling system. I discovered that my rear fan (not on the CPU or PSU fan) has lost a blade. At the very least, this explains some of the noise with my machine.

Since then, I booted up the machine and had a crash twice on booting X. I'll post again as soon as I have a chance to memtest.

Thanks to you both!

Last edited by Eupator; 04-15-2014 at 03:33 PM.
 
Old 04-16-2014, 02:05 PM   #5
Eupator
Member
 
Registered: May 2008
Distribution: Slackware/Slackintosh
Posts: 39

Original Poster
Rep: Reputation: 0
I ran memtest86+ and found no errors.

I've had no trouble booting from the Slackware liveUSB, nor from the SLAX liveCD.

The replacement fan is in the mail. Until then, are there any other diagnostics I can run?
 
Old 04-16-2014, 02:20 PM   #6
mancha
Member
 
Registered: Aug 2012
Posts: 484

Rep: Reputation: Disabled
So live CDs run fine, eh? For starters, can you put together a pastebin with the output from:
  • dmidecode
  • lsmod
  • lspci -v
Also, what machine is this?

--mancha

-----

Edit:

You can also try running mcelog to get some more verbosity. Not sure why Slackware doesn't have a hook
for this but you can add the following code block to /etc/rc.d/rc.local

Code:
# Start mcelog daemon
if [ -x /etc/rc.d/rc.mcelog ]; then
    /etc/rc.d/rc.mcelog start
fi
You can place your mcelog settings in /etc/mcelog.conf

Last edited by mancha; 04-16-2014 at 02:35 PM.
 
Old 04-16-2014, 02:42 PM   #7
Eupator
Member
 
Registered: May 2008
Distribution: Slackware/Slackintosh
Posts: 39

Original Poster
Rep: Reputation: 0
dmidecode

lsmod

lspci -v

I got this machine second-hand, but uname -a returns:

Quote:
Linux sigmund 3.10.17 #2 SMP Wed Oct 23 16:34:38 CDT 2013 x86_64 AMD Phenom(tm) 9600 Quad-Core Processor AuthenticAMD GNU/Linux
 
Old 04-29-2014, 06:40 PM   #8
Eupator
Member
 
Registered: May 2008
Distribution: Slackware/Slackintosh
Posts: 39

Original Poster
Rep: Reputation: 0
Hi again,

I've replaced the broken fan, but I'm still getting crashes during X startup. Any ideas?

Last edited by Eupator; 04-29-2014 at 08:21 PM. Reason: correction, not a CPU fan, just a case fan
 
Old 04-29-2014, 07:42 PM   #9
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Run:
http://www.mersenne.org/download/index.php#source
In mode 1 to try and see if the CPU is working properly. The error clearly states that there is an MCE on the CPU meaning that the CPU may be faulty. Let it run for 13 runs and see if it prints an error.
 
Old 04-29-2014, 07:48 PM   #10
j_v
Member
 
Registered: Oct 2011
Distribution: Slackware64
Posts: 364

Rep: Reputation: 67
Quote:
Originally Posted by Eupator View Post
Hi again,

I've replaced the broken CPU fan, but I'm still getting crashes during X startup. Any ideas?
Specifics would far and away lead to some ideas. I know your original post mentions kernel panic, is that still the issue? Going on what you've mentioned so far, my gut reaction is a faulty cpu core, but that is really just a guess. If it were my machine, I might look into disabling the 3rd core (core 2 being the one to show the fault in the pic you linked to), but I don't know your bios and whether core disabling is even viable with your machine's bios.

Either of these next two suggestions would allow you to temporarily disable suspected cores, to test whether running without them improves matters. These might be better to try first, rather than messing with the bios, because these are fairly simple and can be easily discarded if proved useless:
  1. You could disable an individual core via sysfs:
    Code:
    echo "0" > /sys/bus/cpu/devices/cpu2/online
  2. You could boot with only the first two cores by adding 'maxcpus=2' to the kernel command line.

Bare in mind that I'm am going on a hunch here. My suggestions may only be a blind alley and no help at all.

Regards

EDIT:
@metaschima: You beat me to the punch. Good idea on the prime95 test.

Last edited by j_v; 04-29-2014 at 07:53 PM.
 
Old 04-29-2014, 09:43 PM   #11
ReaperX7
LQ Guru
 
Registered: Jul 2011
Location: California
Distribution: Slackware64-15.0 Multilib
Posts: 6,558
Blog Entries: 15

Rep: Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097Reputation: 2097
It could be several hardware failures.

1. Memtest86+ will see if your RAM may have problems. This can be anything from modules going bad to total failures.

2. When you format a disk, try using the SLOW format to check for bad blocks. If your hard drive has a lot of errors you may need to replace it. A slow format will tell you if there are bad sectors. On large capacity disks this will take considerable time, but it's worth it.

3. Check your cables for breaks, clean the air flow paths, and look for discoloration and burn marks on hardware. Any of these could mean it's time to start replacing hardware.
 
Old 04-29-2014, 09:50 PM   #12
Eupator
Member
 
Registered: May 2008
Distribution: Slackware/Slackintosh
Posts: 39

Original Poster
Rep: Reputation: 0
metaschima, I ran mprimes in single user mode, as you suggested, and it got through six tests before, surprise! Kernel panic.

Here's the output

More interestingly, dmesg threw a couple of these at me:

Quote:
[11701.927635] [Hardware Error]: MC0 Error: Data/Tag DWR error.
[11701.928255] [Hardware Error]: Error Status: Uncorrected, software restartable error.
[11701.928731] [Hardware Error]: CPU:2 (10:2:2) MC0_STATUS[-|UE|-|-|AddrV|UECC]: 0xb441200000000145
[11701.929728] [Hardware Error]: MC0_ADDR: 0x00000001086e6100
[11701.930708] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DWR
[11774.164149] [Hardware Error]: MC0 Error: Data/Tag DWR error.
[11774.164727] [Hardware Error]: Error Status: Uncorrected, software restartable error.
[11774.165107] [Hardware Error]: CPU:2 (10:2:2) MC0_STATUS[-|UE|-|-|AddrV|UECC]: 0xb451a00000000145
[11774.166094] [Hardware Error]: MC0_ADDR: 0x000000011e36b090
[11774.166984] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DWR
Which seems to support the CPU core theory.

j_v, I will try your idea next, and report back.

ReaperX7, I have already (1) run memtest86+, (2) checked my hard drive for bad blocks, and (3) cleaned and inspected my computer's internals.

Thanks to all of you for your input!
 
Old 04-30-2014, 02:22 AM   #13
enorbet
Senior Member
 
Registered: Jun 2003
Location: Virginia
Distribution: Slackware = Main OpSys
Posts: 4,784

Rep: Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434Reputation: 4434
Hello
I'm sorry to say that if that fan has been broken for sufficient time, hardware damage may have occurred. OTOH just as often the thermal grease may simply have "caked up" from overheating and need to be replaced. It would probably be wise to use some monitoring software like Conky to keep a close watch. Of course you could just run lmsensors in a terminal but IMHO constant desktop meters are extremely valuable. Also, you might check in bios to see if your fans have been set to some "quiet mode" that gives silence preference over temperature. Heat is the enemy of electronics.
 
Old 04-30-2014, 03:01 PM   #14
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Check the CPU temperatures and make sure they are under critical. If they are under, then it is very likely that the CPU is faulty.
 
Old 04-30-2014, 08:21 PM   #15
Eupator
Member
 
Registered: May 2008
Distribution: Slackware/Slackintosh
Posts: 39

Original Poster
Rep: Reputation: 0
After passing the kernel 'maxcpus=2' at boot, mprimes appears to run without error, and I have had no kernel panics.

Thanks to you all for helping me pinpoint the problem.

Now to figure out a replacement CPU . . .
 
  


Reply

Tags
kernel panic, slackware64



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Slackware64 14.1 : kernel panic using RAID kikinovak Slackware 15 02-07-2014 11:26 PM
kernel panic Slackware64-14.1 Jeebizz Slackware 2 11-08-2013 01:50 PM
Logging a Kernel Panic Event - Problem writing the log in panic situation lucasct Linux - Embedded & Single-board computer 5 09-08-2011 01:44 PM
Kernel panic after upgrading slackware64-current to 2.6.32.x kernel Lenard Spencer Slackware 4 01-22-2010 10:54 AM
kernel panic (narius panic) narius Linux - Newbie 3 06-20-2002 03:56 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 02:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration