LinuxQuestions.org - Oops: Unable to handle kernel NULL pointer dereference at virtual address with mysqld

- Fedora (https://www.linuxquestions.org/questions/fedora-35/)

- - Oops: Unable to handle kernel NULL pointer dereference at virtual address with mysqld (https://www.linuxquestions.org/questions/fedora-35/oops-unable-to-handle-kernel-null-pointer-dereference-at-virtual-address-with-mysqld-260410/)

Oops: Unable to handle kernel NULL pointer dereference at virtual address with mysqld

My fedora core 2 system crashed and below is the last entry in the syslog.
The server is a dedicated mysql database server with not too much load.
Using kernel: 2.6.8-1.521smp. What does this error mean, how do i "fix" it?
I've also had kernel panics before with /etc/cron.daily/prelink so this machine "feels" unstable.

Quote:

Nov 26 19:28:57 localhost kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
Nov 26 19:28:57 localhost kernel: printing eip:
Nov 26 19:28:57 localhost kernel: 0213fbd4
Nov 26 19:28:57 localhost kernel: *pde = 00003001
Nov 26 19:28:57 localhost kernel: Oops: 0002 [#1]
Nov 26 19:28:57 localhost kernel: SMP
Nov 26 19:28:57 localhost kernel: Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc r8169 floppy sg microco
de dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd ata_piix libata sd_mod scsi_mod
Nov 26 19:28:57 localhost kernel: CPU: 0
Nov 26 19:28:57 localhost kernel: EIP: 0060:[<0213fbd4>] Not tainted
Nov 26 19:28:57 localhost kernel: EFLAGS: 00010002 (2.6.8-1.521smp)
Nov 26 19:28:57 localhost kernel: EIP is at __rmqueue+0x44/0x110
Nov 26 19:28:57 localhost kernel: eax: 00000000 ebx: 00000000 ecx: 2000107c edx: 0232cd88
Nov 26 19:28:57 localhost kernel: esi: 0232cb80 edi: 0232cd88 ebp: 0232cf10 esp: 410f0db4
Nov 26 19:28:57 localhost kernel: ds: 007b es: 007b ss: 0068
Nov 26 19:28:57 localhost kernel: Process mysqld (pid: 23015, threadinfo=410f0000 task=5c680cb0)
Nov 26 19:28:57 localhost kernel: Stack: 0003e7d7 20001064 00000000 0232cb80 0232cb80 00000006 0000000c 0232cf10
Nov 26 19:28:57 localhost kernel: 0213fd00 0000000c 00000010 00000000 5ee18080 00000000 0232cb80 00000246
Nov 26 19:28:57 localhost kernel: 0232cb80 0232cf00 0214001c 0232cf10 00000000 000000d2 00000000 00000000
Nov 26 19:28:57 localhost kernel: Call Trace:
Nov 26 19:28:57 localhost kernel: [<0213fd00>] rmqueue_bulk+0x60/0xb2
Nov 26 19:28:57 localhost kernel: [<0214001c>] buffered_rmqueue+0x64/0x1e9
Nov 26 19:28:57 localhost kernel: [<0214024b>] __alloc_pages+0xaa/0x2be
Nov 26 19:28:57 localhost kernel: [<0214bd83>] do_anonymous_page+0xb6/0x241
Nov 26 19:28:57 localhost kernel: [<0214bf77>] do_no_page+0x69/0x3a0
Nov 26 19:28:57 localhost kernel: [<0214c460>] handle_mm_fault+0xdf/0x1d4
Nov 26 19:28:57 localhost kernel: [<0214da65>] vma_merge+0x155/0x165
Nov 26 19:28:57 localhost kernel: [<0211955b>] do_page_fault+0x17c/0x58b
Nov 26 19:28:57 localhost kernel: [<0210c044>] old_mmap+0xde/0x119
Nov 26 19:28:57 localhost kernel: [<021193df>] do_page_fault+0x0/0x58b
Nov 26 19:28:57 localhost kernel: Code: 89 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 8b 54

NULL is address 0, which is never a valid value for a pointer. Basically, the kernel has tried to access whatever is at address 0, which is an invalid operation, and so it's killed itself (to prevent it from doing any more serious harm).

One possible explanation for this is a (physical) memory error, typically caused by a damaged RAM chip. Another possibility is a motherboard bug from a few years ago, where some BIOSes would report a memory SIMM (DIMM?) as having around twice its actual size; attempting to access the area above the first half would simply return 0 (hence lots of NULL pointer errors). If this is the case they you can try the mem=bytes kernel command-line option to tell your kernel how much RAM you actually have.

I can guarentee you that if a memory bug existed in the paging code (which is where your kernel was when it crashed) then it would have been fixed by now. I can only assume that you've tried upgrading to the latest kernel release for Fedora.

One test you can do to see if your memory is working is to download the source code for gcc and compile it (don't bother installing it; just compile it). If it crashes, but then crashes in a different place the next time around (after re-issuing the make command), then you definitely have broken RAM.

Thanks for the help. :)
I'll try the stuff you mentioned...

I'll probably try memtest86 too...