LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
LinkBack Search this Thread
Old 06-01-2005, 02:56 PM   #1
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Rep: Reputation: 15
Kernel Panic, MCE messages, and an Error Code


My firewall is running on a Linux 2.6.9 kernel and has been functioning just fine for months. This morning, I found the machine having some unusual problems, none of which I'd seen exhibited on any machine before.

First, the machine couldn't detect a dial tone from the modem, despite the fact that the modem was cleanly initialized and there was definitely a dial tone on the line. After reboot, it worked just fine.

Next, I found the following message in the /var/log/messages several times:

Code:
Jun  1 15:29:39 dib kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Jun  1 15:29:39 dib kernel: Bank 1: 9400000000000151
Jun  1 15:32:09 dib kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Jun  1 15:32:09 dib kernel: Bank 1: d400000000000151
Finally, the machine dies every so often (say, five to ten minutes or so) with a really cryptic kernel panic. Unfortunately, I can only see the last 25 lines:

Code:
[<c037f040>] ip_rcv_finish+0x0/0x2c0
[<c0360084>] nf_hook_slow+0xe4/0x120
[<c037f040>] ip_rcv_finish+0x0/0x2c0
[<c037ed69>] ip_rcv+0x439/0x500
[<c037f040>] ip_rcv_finish+0x0/0x2c0
[<c0355807>] netif_receive_skb+0x117/0x1d0
[<c034d4a7>] alloc_skb+0x47/0xe0
[<c02d3779>] rtl8139_rx+0x199/0x340
[<c02d3b0a>] rtl8139_poll+0x5a/0xe0
[<c0355a53>] net_rx_action+0x83/0x110
[<c0123d3a>] __do_softirq+0xba/0xd0
[<c010892c>] do_softirq+0x4c/0x60
=======================
[<c0108045>] do_IRQ+0x165/0x1b0
[<c0105be8>] common_interrupt+0x18/0x20
[<c0103030>] default_idle+0x0/0x40
[<c010305c>] default_idle+0x2c/0x40
[<c01030f2>] cpu_idle+0x42/0x60
[<c051d937>] start_kernel+0x167/0x190
[<c051d3a0>] unknown_bootoption+0x0/0x160
Code: 8b 44 24 24 89 44 24 04 e8 85 7d ff ff 8b 5c 24 18 83 c4 1c c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 31 ed 57 56 53 83 ec 34 <8b> 54 24 48 8b 4c 24 48 0f b6 42 0e 8b 04 85 00 e7 5a c0 89 44
 <0>Kernel panic - not syncing: Fatal exception in interrupt
It looks to me like a stack trace or something. It should be noted that I have the RealTek 8139 ethernet drivers compiled into the kernel and the network card in the machine is a RealTek 8139 chipset.

I did a little research and ran into a program called parsemce. I parsed the first dump in the /var/log/messages file and got:

Code:
parsebank(1): 9400000000000151 @ 0
        External tag parity error
        Address in addr register valid
        Error enabled in control register
        Memory heirarchy error
        Request: Generic error
        Transaction type : Instruction
        Memory/IO : Reserved
Unfortunately, I have no clue at all what I'm looking at. The kernel panic message seems to be some kind of stack trace, but I don't have all of it and wouldn't know what to do with it anyway.

Does anyone have any guesses as to what could've gone wrong?
 
Old 06-02-2005, 01:57 AM   #2
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Original Poster
Rep: Reputation: 15
Unhappy An Unpleasant Update

Okay, so I fired up the machine to give it another go a little while ago and the hard drive was making noises that sounding like heavy construction equipment. I pulled the drive, had a small episode with trying to make the ECSD check the hardware from scratch (bloody cryptic BIOS), and popped a 20 Gb hard drive in that I salvaged from a broken machine long ago.

The new hard drive seems to be working perfectly. However, I popped the Slackware 10 CD into the drive and started up... and I got a segmentation fault from the USB scan. Okay, this is worrisome, but it kept moving. Deal with that later... we'll run badblocks for now to make sure the hard drive is okay.

Well, badblocks froze about halfway through writing the first pass. No message, no anything. I rebooted. USB check ran fine this time. However, the moment badblocks started, I got the following:

Code:
Checking for bad blocks (read-only test): Unable to handle kernel paging request at virtual address 09a657a5
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c01ce270>]     Not tainted
EFLAGS: 00010283
eax: c12d2968   ebx: cf7ee390   ecx: cf7ee390   efx: 000000d2
esi: 00000400   edi: cf7ee350   ebp: 000000d6   esp: cedb9c48
ds: 0018   es: 0018   ss: 0018
Process badblocks (pid: 185, stackpage=cedb9000)
Stack: c01ce447 cf7ee390 c12d2968 00000002 cedb9c6c cedb9c6c fffffffe c011c5b8
       cf7ee350 c03a2780 c03a2780 c0390d80 c0388a30 c011f8ca c0320afc c011c4f2
       c011c404 00000001 00000001 c011c213 c0388a30 c0388900 00000000 c031f650
Call Trace:    [<c01ce447>] [<c011c5b8>] [<c011f8ca>] [<c011c4f2>] [<c011c404>]
  [<c011c213>] [<c010a09d>] [<c010c488>] [<c02028f9>] [<c021406a>] [<c020b9c5>]
  [<c020bb08>] [<c020bc3f>] [<c01e5074>] [<c011c5b8>] [<c014d59a>] [<c013b223>]
  [<c013aeb6>] [<c0127cda>] [<c0125a30>] [<c0125c2c>] [<c013d3c3>] [<c013d360>]
  [<c012a08b>] [<c012a2c7>] [<c0144b8d>] [<c01376c8>] [<c0108d73>]

Code: 55 57 56 53 8b 74 24 14 8b 6c 24 1c 8b 7e 10 4d 4f 83 fd ff
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
I ran memtest86+ v1.55 on this box only an hour ago and it said the RAM was fine. Is the CPU on this machine toast? Does anyone have any idea what the heck is going on here? I'd love to keep using this machine if possible... a computer is a terrible thing to waste.

You know things are bad when you get a silly-looking message from the kernel instead of a vaguely professional looking one. I'm recalling the "food fight!" message at this point...

Cheers and Thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Kernel 2.4.7 Compile error messages~ help me? leeysa Linux - Software 2 03-09-2005 08:22 PM
Error Messages with new Kernel davcefai Linux - General 2 01-26-2005 04:20 PM
strange error messages when recompiling kernel. levicc00123 Slackware 8 12-21-2004 11:00 PM
Getting some error messages after upgrading kernel... Whitehat Slackware 2 01-04-2004 08:10 AM
Boot failure: kernel panic, busybox messages ethrbunny Linux - General 2 06-18-2003 03:19 PM


All times are GMT -5. The time now is 05:33 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration