Kernel panic on rhel4 system.
I've been trying to track this kernel panic down for the last few days on this system... I believe it might be autofs/nfs related, but I'm not 100% sure.
Here is the dump in /var/log/messages: Jul 28 13:25:13 server-a kernel: bad: scheduling while atomic! Jul 28 13:25:13 server-a kernel: Jul 28 13:25:13 server-a kernel: Call Trace:<ffffffff803095d1>{schedule+75} <ffffffffa01b7642>{:sunrpc:rpc_wake_up_next+350} Jul 28 13:25:13 server-a kernel: <ffffffffa01b4a2a>{:sunrpc:__xprt_lock_write_next+70} Jul 28 13:25:13 server-a kernel: <ffffffffa01b61ee>{:sunrpc:xprt_transmit+1107} <ffffffffa01b7da2>{:sunrpc:__rpc_execute+462} Jul 28 13:25:13 server-a kernel: <ffffffff80135752>{autoremove_wake_function+0} <ffffffff80135752>{autoremove_wake_function+0} Jul 28 13:25:13 server-a kernel: <ffffffffa01b379c>{:sunrpc:rpc_call_sync+114} <ffffffffa02033c2>{:nfs:nfs3_rpc_wrapper+38} Jul 28 13:25:13 server-a kernel: <ffffffffa020361a>{:nfs:nfs3_proc_getattr+138} <ffffffffa01fb5bb>{:nfs:__nfs_revalidate_inode+320} Jul 28 13:25:13 server-a kernel: <ffffffffa0200355>{:nfs:nfs_pagein_list+75} <ffffffff8030a0f5>{thread_return+0} Jul 28 13:25:13 server-a kernel: <ffffffff8030a14d>{thread_return+88} <ffffffff801609f0>{read_pages+57} Jul 28 13:25:13 server-a kernel: <ffffffffa01f75bf>{:nfs:nfs_lookup_revalidate+459} Jul 28 13:25:13 server-a kernel: <ffffffff80132155>{recalc_task_prio+337} <ffffffff801321e3>{activate_task+124} Jul 28 13:25:13 server-a kernel: <ffffffff8013271e>{try_to_wake_up+876} <ffffffffa01b8da3>{:sunrpc:rpcauth_lookup_credcache+566} Jul 28 13:25:13 server-a kernel: <ffffffff80157db7>{audit_update_watch+85} <ffffffff8019012a>{__d_lookup+287} Jul 28 13:25:13 server-a kernel: <ffffffff80185d2a>{do_lookup+388} <ffffffff801868a2>{__link_path_walk+2508} Jul 28 13:25:13 server-a kernel: <ffffffff80186d62>{link_path_walk+82} <ffffffff801ece75>{strncpy_from_user+74} Jul 28 13:25:13 server-a kernel: <ffffffff8015702f>{audit_getname+133} <ffffffff80186faf>{path_lookup+451} Jul 28 13:25:13 server-a kernel: <ffffffff8018788f>{open_namei+172} <ffffffff80178b6e>{filp_open+80} Jul 28 13:25:13 server-a kernel: <ffffffff801ece75>{strncpy_from_user+74} <ffffffff8015702f>{audit_getname+133} Jul 28 13:25:13 server-a kernel: <ffffffff80178d77>{sys_open+57} <ffffffff801103ce>{tracesys+209} Jul 28 13:25:13 server-a kernel: Jul 28 13:25:13 server-a kernel: Unable to handle kernel paging request at 00000003a0b8cd68 RIP: Jul 28 13:25:14 server-a kernel: <ffffffff80309e01>{schedule+2171} Jul 28 13:25:14 server-a kernel: PML4 4018ef067 PGD 0 Jul 28 13:25:14 server-a kernel: Oops: 0002 [1] SMP Google'ing the hell out of the log and I just can't turn up anything of use... It seems to only happen when we're using the box to rsync data from one nfs mount point to another, but it happens very randomly. Wondering if anyone can lend a hand in a direction that might resolve this issue before I take this server out to the desert. :( |
What kernel are you running? Update it to the latest RHEL 4 kernel. Also update any nfs and autofs (automount) packages you have installed. Oh and rsync too for the heck of it.
|
Unfortunately I don't have that path to travel on... I either have to fix the existing system or sit and wait for a new system to be built to replace it.(Currently in the works, but... probably a few weeks / to a month away)
The problem is, I'll have to wait a while to get that done... and I need to use some tools that heavily rely on nfs working on this server in the mean time. Quote:
|
How is updating the kernel not a fix to the existing system?
Here: Update the kernel. Also, install, configure, and enable diskdump so you can get more information out of crashes. Don't reboot; wait for it to crash. It will boot into the new kernel. If it crashes again you'll have more debugging information. |
All times are GMT -5. The time now is 09:28 PM. |