LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   server rebooting issue (http://www.linuxquestions.org/questions/linux-server-73/server-rebooting-issue-849890/)

kingston 12-13-2010 05:21 AM

server rebooting issue
 
Hi All

I am using rhel 4.7 on hp proliant 380 G6 series servers. last week my server was rebooted twice and its giving the following log messages
Quote:

"build.sh" not found in map.
Dec 7 12:28:29 einbalx0003 automount[13488]: failed to mount /efsroots/10/project/build.sh
Dec 7 12:29:30 einbalx0003 automount[23601]: lookup(yp): key "build.sh" not found in map.
Dec 7 12:29:30 einbalx0003 automount[23601]: failed to mount /efsroots/10/project/build.sh
Dec 7 12:30:01 einbalx0003 automount[32554]: lookup(yp): key "opc_op" not found in map.
Dec 7 12:30:01 einbalx0003 automount[32554]: failed to mount /home/opc_op
Dec 7 12:30:43 einbalx0003 kernel: allocation failed: out of vmalloc space - use vmalloc=<size> to increase size.
Dec 7 12:30:43 einbalx0003 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Dec 7 12:30:43 einbalx0003 kernel: printing eip:
Dec 7 12:30:43 einbalx0003 kernel: c02dd7d1
Dec 7 12:30:43 einbalx0003 kernel: *pde = 30237001
Dec 7 12:30:43 einbalx0003 kernel: Oops: 0002 [#1]
Dec 7 12:30:43 einbalx0003 kernel: SMP
Dec 7 12:30:43 einbalx0003 kernel: Modules linked in: nfs lockd nfs_acl mptctl mptbase sg ipmi_devintf ipmi_si ipmi_msghandler efs100(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc cpufreq_powersave loop dm_multipath joydev button battery ac ehci_hcd uhci_hcd bnx2 sr_mod dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2400 ata_piix libata cciss qla2xxx scsi_transport_fc sd_mod scsi_mod
Dec 7 12:30:43 einbalx0003 kernel: CPU: 0
Dec 7 12:30:43 einbalx0003 kernel: EIP: 0060:[<c02dd7d1>] Tainted: P VLI
Dec 7 12:30:43 einbalx0003 kernel: EFLAGS: 00010246 (2.6.9-89.0.23.ELsmp)
Dec 7 12:30:43 einbalx0003 kernel: EIP is at __lock_text_end+0xa22/0x1085
Dec 7 12:30:43 einbalx0003 kernel: eax: 00000000 ebx: 00000000 ecx: 00001000 edx: 00b92008
Dec 7 12:30:43 einbalx0003 kernel: esi: 00b92008 edi: 00000000 ebp: ec25c000 esp: ea16aecc
Dec 7 12:30:43 einbalx0003 kernel: ds: 007b es: 007b ss: 0068
Dec 7 12:30:43 einbalx0003 kernel: Process ecagent (pid: 2694, threadinfo=ea16a000 task=ed73f1f0)
Dec 7 12:30:43 einbalx0003 kernel: Stack: 00000000 00001000 d5bdc3e0 00b92008 f9dfb68f 00000000 c96e2e00 e2d3b00c
Dec 7 12:30:43 einbalx0003 kernel: e2d3a000 c96e2e00 ea16af38 f9dfa913 00000000 00004000 ea16af40 ea16a000
Dec 7 12:30:43 einbalx0003 kernel: 06f42fe0 d2e84a60 f9dfae96 ea16af3c 00000001 00000000 e2d3a000 00000000
Dec 7 12:30:43 einbalx0003 kernel: Call Trace:
Dec 7 12:30:43 einbalx0003 kernel: [<f9dfb68f>] BufferSyncPages+0xb6/0x12f [efs100]
Dec 7 12:30:43 einbalx0003 kernel: [<f9dfa913>] AgentRead+0xc9/0x14a [efs100]
Dec 7 12:30:43 einbalx0003 kernel: [<f9dfae96>] AgentIoctl+0x4b0/0x5c5 [efs100]
Dec 7 12:30:43 einbalx0003 kernel: [<f9dfcfdf>] EfsDevIoctl+0x2b/0x36 [efs100]
Dec 7 12:30:43 einbalx0003 kernel: [<c016dc8a>] sys_ioctl+0x227/0x269
Dec 7 12:30:43 einbalx0003 kernel: [<c012700c>] sys_gettimeofday+0x9a/0xac
Dec 7 12:30:43 einbalx0003 kernel: [<c02ddf97>] syscall_call+0x7/0xb
Dec 7 12:30:43 einbalx0003 kernel: [<c02d007b>] xfrm_policy_bysel+0x18/0x81
Dec 7 12:30:43 einbalx0003 kernel: Code: 88 00 51 50 31 c0 f3 aa 58 59 e9 6c ab ee ff 01 c1 e9 a0 ab ee ff 8d 4c 88 00 e9 97 ab ee ff 01 c1 eb 04 8d 4c 88 00 51 50 31 c0 <f3> aa 58 59 e9 c8 ab ee ff ba f2 ff ff ff e9 74 21 ef ff b9 f2
Dec 7 12:30:43 einbalx0003 kernel: <0>Fatal exception: panic in 5 seconds
Dec 7 12:44:23 einbalx0003 syslogd 1.4.1: restart.
I couldnt find the exact reason for the "vmalloc space" error. Is it a paging issue? i googled it sufficiently but didnt understand exactly what the problem is. Can someone help me on this?

ShadowCat8 12-13-2010 08:18 PM

Well,

On a quick scan of your output, my first thought would be; The server in question thinks that a critical part of it's filesystem is an NFS-share mount from another system.
Quote:

Originally Posted by kingston
Dec 7 12:28:29 einbalx0003 automount[13488]: failed to mount /efsroots/10/project/build.sh
Dec 7 12:29:30 einbalx0003 automount[23601]: lookup(yp): key "build.sh" not found in map.
Dec 7 12:29:30 einbalx0003 automount[23601]: failed to mount /efsroots/10/project/build.sh
Dec 7 12:30:01 einbalx0003 automount[32554]: lookup(yp): key "opc_op" not found in map.
Dec 7 12:30:01 einbalx0003 automount[32554]: failed to mount /home/opc_op

"lookup(yp)" is a lookup from ypbind to contact and mount an NFS share. Is the server that those filesystems are housed on powered up and available from the network? Also, is the problem-server's /etc/hosts file okay? Ypbind relies on /etc/hosts, as noted here.

Also, I note that RHEL 4.7 is running a 2.6.9-based linux kernel. While I know that you don't want to do it while the server is having an issue, once you have the server stabilized, I'd recommend getting that one updated at your next available opportunity. Current stable linux kernel source for Gentoo right now is 2.6.34-r6, and there have been some kernel-level security issues addressed between 2.6.9 and 2.6.34.

HTH. Let us know.

chrism01 12-14-2010 01:09 AM

Actually, ypbind is part of NIS http://www.linuxhomenetworking.com/w...onfiguring_NIS, not NFS, although the 2 are often seen together as per that link.

kingston 12-15-2010 03:53 AM

hi shadowcat,

Actually our NIS master is getting the filesystem from netapp storage as nfs mount. The problematic server is a nis-client, we found yesterday that the application running on the server causes this problem. Actually the root cause is that the EC kernel module does not handle null pointer returns on requests for memory, thus the kernel panic and crash. We are using Electric Cloud agents on it. But still we are waiting for our/EC IT team to help us.

kingston 04-29-2011 12:16 AM

our application team found that this is the common problem in our environment. The team updated that the application couldnt handle/release the memory pages after completing certain jobs and this problem will be rectified once they upgraded it to the latest version.

Thanks to all for your contribution.....


All times are GMT -5. The time now is 07:19 PM.