I have an issue similar to these three. But unfortunately not the same.
http://www.linuxquestions.org/questi...erence-169057/
http://www.linuxquestions.org/questi...erence-197581/
http://www.linuxquestions.org/questi...mysqld-260410/
The story goes like this:
The Server is SLES10 SP3. It is an Oracle server.
Code:
# uname -a
Linux <hostname> 2.6.16.60-0.54.5-bigsmp #1 SMP Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux
Code:
# cat /etc/SuSE-release
SUSE Linux Enterprise Server 10 (i586)
VERSION = 10
PATCHLEVEL = 3
Up until recently it was halting every night with little to no explanation
Last night it halted again however this time it managed to throw the below information into the /var/log/messages file.
Code:
Unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
c0163f41
*pde = 12aee001
Oops: 0002 [#1]
SMP
Dlast sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: raw nfsd exportfs lockd nfs_acl sunrpc dock button battery ac loop usbhid dm_mod 8139too mii uhci_hcd ehci_hcd ide_cd usbcore cdrom i2c_piix4 i2c_core parport_pc lp parport reiserfs edd fan thermal processor xen_platform_pci piix ide_disk ide_core
CPU: 0
EIP: 0060:[<c0163f41>] Tainted: G U VLI
EFLAGS: 00010046 (2.6.16.60-0.54.5-bigsmp #1)
EIP is at cache_alloc_refill+0x152/0x4fd
eax: f7c9d700 ebx: e8bb7000 ecx: 0000000a edx: 00000000
esi: dfcc9a40 edi: 00000246 ebp: f7c9d700 esp: da0cbcc4
ds: 007b es: 007b ss: 0068
Process tar (pid: 9369, threadinfo=da0ca000 task=dfef80b0)
Stack: <0>000000d0 dfcc9a40 00000011 f7c764c0 00000001 0c30917f 00000000 0c30917f
f4f0d7a0 c017d99a 000000d0 dfcc9a40 00000246 f6e87600 c0163de5 c56b9984
f6e87600 00000000 f88f1515 c017db1a c56b9984 da0cbd84 00000000 c017e937
Call Trace:
[<c017d99a>] find_inode+0x1b/0x56
[<c0163de5>] kmem_cache_alloc+0x45/0x4f
[<f88f1515>] reiserfs_alloc_inode+0xf/0x1e [reiserfs]
[<c017db1a>] alloc_inode+0x12/0x192
[<c017e937>] iget5_locked+0x76/0x177
[<f88ea97e>] reiserfs_find_actor+0x0/0x1b [reiserfs]
[<f88e848a>] reiserfs_iget+0x26/0x78 [reiserfs]
[<f88ea970>] reiserfs_init_locked_inode+0x0/0xe [reiserfs]
[<f88e568d>] reiserfs_lookup+0xd0/0x12f [reiserfs]
[<c0173194>] do_lookup+0xaf/0x151
[<c01754f3>] __link_path_walk+0x88e/0xd6c
[<c0149599>] do_generic_mapping_read+0x443/0x48a
[<c0175a1e>] link_path_walk+0x4d/0xc3
[<c01cdedc>] _atomic_dec_and_lock+0x24/0x40
[<c0175dd9>] do_path_lookup+0x1fc/0x26f
[<c0176587>] __user_walk_fd+0x2a/0x3b
[<c016f898>] vfs_lstat_fd+0x12/0x39
[<c01cdedc>] _atomic_dec_and_lock+0x24/0x40
[<c016f904>] sys_lstat64+0xf/0x23
[<c01681a6>] __fput+0x142/0x170
[<c0165b27>] filp_close+0x4e/0x54
[<c0166c91>] sys_close+0x63/0x95
[<c0103dcb>] sysenter_past_esp+0x54/0x79
Code: 14 8b 44 24 0c 89 54 88 14 41 89 08 8b 54 24 04 8b 82 18 02 00 00 39 43 10 73 0b ff 4c 24 08 83 7c 24 08 ff 75 b3 8b 13 8b 43 04 <89> 42 04 89 10 83 7b 14 ff c7 03 00 01 10 00 c7 43 04 00 02 20
I have read in one of the above listed posts that it could be hardware related (RAM) however I can confidently rule out this because we have moved this server to a completely different hardware set (Same Specs) and are seeing the same issue.
I have also read that:
Quote:
NULL is address 0, which is never a valid value for a pointer. Basically, the kernel has tried to access whatever is at address 0, which is an invalid operation, and so it's killed itself (to prevent it from doing any more serious harm).
|
And that:
Quote:
some BIOSes would report a memory SIMM (DIMM?) as having around twice its actual size; attempting to access the area above the first half would simply return 0 (hence lots of NULL pointer errors). If this is the case they you can try the mem=bytes kernel command-line option to tell your kernel how much RAM you actually have.
|
However I do not know how I could confirm or deny this.
/proc/meminfo has the right amount of RAM listed.
When this machine "Halts" I must hard reboot it to get it back again. Surprisingly I would prefer to not have to do this most mornings.
Can anyone suggest a resolution?
TIA