Kernel Panic - not syncing: Aiee, killing interrupt handler
Hi folks,
for a few months now I'm facing kernel panics every few days on a Ubuntu Server 8.04 installation running on a remote server. After having installed and tried several other versions of the kernel, without success, I made my step towards Debian itself. Two days ago I reinstalled the whole system using the latest Debian Stable but restored the /var and /home directories, and based some /etc configurations on what I previously had. Everything ran smoothly until yesterday I faced a Kernel Oops message when logged in to the terminal. I somehow ignored it since the system still ran fine. Until this morning, the system seems to have frozen up (I cannot login, nor does the website work, or the mail server); exactly what I had when it faced a kernel panic, so I'm guessing it has ran into one again. I cannot reboot the system until monday (on the ubuntu installation I set the kernel.panic variable in /etc/sysctl.conf without effect (why is that?)). I've not been able to isolate the problem yet, but I managed to take two pictures of the screen (I know it's not much) from when it ran ubuntu. I'm guessing it must be somewhat the same problem causing this trouble over and over again. First Panic Second Panic I have no clue at all where to look at, or where to start to solve this very annoying problem (it's a webserver so people depend on it somehow). Thanks in advance :) |
From the error messages, it appears to be a hardware related problem definitely. Try to isolate the component which is giving you trouble. I'd start with the RAM.
You normally won't get kernel panics like this for normal software related issues. |
Just because I'm curious: how can you see that?
About a month ago i ran memtest86 for about half an hour and that didn't report any error... What other tools could you suggest? |
Quote:
In my experience, whenever I've experienced Kernel panics, it's either: a. A badly corrupted installation of Linux (where the file system or essential system files are trashed) b. Hardware related error |
I managed to get the new logs for the latest kernel panic. It's as follows:
Aug 15 16:39:51 vgkfgen1 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000050 RIP: Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff802aa2b7>] __dec_zone_page_state+0x1b/0x6c Aug 15 16:39:51 vgkfgen1 kernel: PGD 1c254067 PUD 29511067 PMD 0 Aug 15 16:39:51 vgkfgen1 kernel: Oops: 0000 [1] SMP Aug 15 16:39:51 vgkfgen1 kernel: CPU 0 Aug 15 16:39:51 vgkfgen1 kernel: Modules linked in: it87 hwmon_vid i2c_isa eeprom i2c_dev tcp_diag inet_diag nfs nfsd exportfs lockd nfs_acl sunrpc ipv6 button ac battery dm_snapshot dm_mirror dm_mod loop serio_raw i2c_piix4 snd_hda_intel snd_hda_codec parport_pc parport i2c_core pcspkr snd_pcm snd_timer snd soundcore psmouse shpchp pci_hotplug snd_page_alloc evdev ext3 jbd mbcache ide_cd cdrom sd_mod atiixp ehci_hcd generic ide_core r8169 ahci libata scsi_mod ohci_hcd thermal processor fan Aug 15 16:39:51 vgkfgen1 kernel: Pid: 177, comm: pdflush Not tainted 2.6.18-6-amd64 #1 Aug 15 16:39:51 vgkfgen1 kernel: RIP: 0010:[<ffffffff802aa2b7>] [<ffffffff802aa2b7>] __dec_zone_page_state+0x1b/0x6c Aug 15 16:39:51 vgkfgen1 kernel: RSP: 0018:ffff810037b0bc38 EFLAGS: 00010016 Aug 15 16:39:51 vgkfgen1 kernel: RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000001 Aug 15 16:39:51 vgkfgen1 kernel: RDX: ffff8100370bc848 RSI: 0000000000000005 RDI: ffff8100006d84c8 Aug 15 16:39:51 vgkfgen1 kernel: RBP: ffff810037b0be70 R08: ffff810036520ae0 R09: 0000000000000000 Aug 15 16:39:51 vgkfgen1 kernel: R10: ffff8100223139d0 R11: 0000000000000001 R12: ffff8100370bc848 Aug 15 16:39:51 vgkfgen1 kernel: R13: 0000000000000002 R14: ffff810036bb47f0 R15: 0000000000000000 Aug 15 16:39:51 vgkfgen1 kernel: FS: 00002b361c863f60(0000) GS:ffffffff80520000(0000) knlGS:0000000000000000 Aug 15 16:39:51 vgkfgen1 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Aug 15 16:39:51 vgkfgen1 kernel: CR2: 0000000000000050 CR3: 000000000a1da000 CR4: 00000000000006e0 Aug 15 16:39:51 vgkfgen1 kernel: Process pdflush (pid: 177, threadinfo ffff810037b0a000, task ffff810037ae6770) Aug 15 16:39:51 vgkfgen1 kernel: Stack: ffffffff802aa4c4 ffff8100006d84c8 ffffffff8022982d ffff810036bb47f0 Aug 15 16:39:51 vgkfgen1 kernel: ffffffff8021ac6d 0000000000000000 0000000e00000000 0000000000000000 Aug 15 16:39:51 vgkfgen1 kernel: ffffffff880ed283 ffffffffffffffff 0000000000000002 000000000000000e Aug 15 16:39:51 vgkfgen1 kernel: Call Trace: Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff802aa4c4>] dec_zone_page_state+0x9/0xd Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8022982d>] clear_page_dirty_for_io+0x45/0x57 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8021ac6d>] mpage_writepages+0x183/0x34d Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff880ed283>] :ext3:ext3_ordered_writepage+0x0/0x198 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff80256452>] do_writepages+0x29/0x2f Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8022dc59>] __writeback_single_inode+0x1b4/0x38b Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8021ede0>] sync_sb_inodes+0x1d1/0x2b5 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8028f8fc>] keventd_create_kthread+0x0/0x61 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8024c66e>] writeback_inodes+0x7d/0xd3 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff802a803c>] background_writeout+0x82/0xb5 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff802520cf>] pdflush+0x0/0x1ed Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff80252212>] pdflush+0x143/0x1ed Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff802a7fba>] background_writeout+0x0/0xb5 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff802305dc>] kthread+0xd4/0x107 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff80258aa0>] child_rip+0xa/0x12 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8028f8fc>] keventd_create_kthread+0x0/0x61 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff8026ea6b>] physflat_send_IPI_mask+0x0/0x6a Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff80230508>] kthread+0x0/0x107 Aug 15 16:39:51 vgkfgen1 kernel: [<ffffffff80258a96>] child_rip+0x0/0x12 Aug 15 16:39:51 vgkfgen1 kernel: Aug 15 16:39:51 vgkfgen1 kernel: Aug 15 16:39:51 vgkfgen1 kernel: Code: 49 8b 54 c1 50 4c 8d 44 32 41 41 8a 00 ff c8 41 88 00 8a 52 Aug 15 16:39:51 vgkfgen1 kernel: RIP [<ffffffff802aa2b7>] __dec_zone_page_state+0x1b/0x6c Aug 15 16:39:51 vgkfgen1 kernel: RSP <ffff810037b0bc38> Aug 15 16:39:51 vgkfgen1 kernel: CR2: 0000000000000050 Can you make up more information out of that? Thanks :) |
Everything in that log before the code is pretty much meaningless, everything that occurs before the kernal panic is pretty much things that happened successfully (that is ignoring the deferencing a Null Pointer, fairly sure that's not related to the problem at hand). I would check both your hardware and bios. You could also see if you can get into single user mode, if can you do that you might actually be able to get at any logs the kernal maybe producing just before it panics.
|
All times are GMT -5. The time now is 04:07 PM. |