My firewall is running on a Linux 2.6.9 kernel and has been functioning just fine for months. This morning, I found the machine having some unusual problems, none of which I'd seen exhibited on any machine before.
First, the machine couldn't detect a dial tone from the modem, despite the fact that the modem was cleanly initialized and there was definitely a dial tone on the line. After reboot, it worked just fine.
Next, I found the following message in the /var/log/messages several times:
Code:
Jun 1 15:29:39 dib kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Jun 1 15:29:39 dib kernel: Bank 1: 9400000000000151
Jun 1 15:32:09 dib kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Jun 1 15:32:09 dib kernel: Bank 1: d400000000000151
Finally, the machine dies every so often (say, five to ten minutes or so) with a really cryptic kernel panic. Unfortunately, I can only see the last 25 lines:
Code:
[<c037f040>] ip_rcv_finish+0x0/0x2c0
[<c0360084>] nf_hook_slow+0xe4/0x120
[<c037f040>] ip_rcv_finish+0x0/0x2c0
[<c037ed69>] ip_rcv+0x439/0x500
[<c037f040>] ip_rcv_finish+0x0/0x2c0
[<c0355807>] netif_receive_skb+0x117/0x1d0
[<c034d4a7>] alloc_skb+0x47/0xe0
[<c02d3779>] rtl8139_rx+0x199/0x340
[<c02d3b0a>] rtl8139_poll+0x5a/0xe0
[<c0355a53>] net_rx_action+0x83/0x110
[<c0123d3a>] __do_softirq+0xba/0xd0
[<c010892c>] do_softirq+0x4c/0x60
=======================
[<c0108045>] do_IRQ+0x165/0x1b0
[<c0105be8>] common_interrupt+0x18/0x20
[<c0103030>] default_idle+0x0/0x40
[<c010305c>] default_idle+0x2c/0x40
[<c01030f2>] cpu_idle+0x42/0x60
[<c051d937>] start_kernel+0x167/0x190
[<c051d3a0>] unknown_bootoption+0x0/0x160
Code: 8b 44 24 24 89 44 24 04 e8 85 7d ff ff 8b 5c 24 18 83 c4 1c c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 31 ed 57 56 53 83 ec 34 <8b> 54 24 48 8b 4c 24 48 0f b6 42 0e 8b 04 85 00 e7 5a c0 89 44
<0>Kernel panic - not syncing: Fatal exception in interrupt
It looks to me like a stack trace or something. It should be noted that I have the RealTek 8139 ethernet drivers compiled into the kernel and the network card in the machine is a RealTek 8139 chipset.
I did a little research and ran into a program called parsemce. I parsed the first dump in the /var/log/messages file and got:
Code:
parsebank(1): 9400000000000151 @ 0
External tag parity error
Address in addr register valid
Error enabled in control register
Memory heirarchy error
Request: Generic error
Transaction type : Instruction
Memory/IO : Reserved
Unfortunately, I have no clue at all what I'm looking at. The kernel panic message seems to be some kind of stack trace, but I don't have all of it and wouldn't know what to do with it anyway.
Does anyone have any guesses as to what could've gone wrong?