Troubled RHEL6.1 server

eur0disciple · 10-27-2011, 11:35 AM

I have recently reinstalled a server that I administer. The server is now running RHEL6.1. It is our main file server/NAS, via NFS. It also runs a couple of minor services; DNS, DHCP, and NTP.

This server has rebooted on it's own several times since the reinstall, six times in the last week. I have checked the logs several times and have not found anything that would point to a cause of these reboots. The logs are sometimes lacking anywhere from five, to fifty, minutes worth of logs prior to the boot sequence. I initially thought cronjobs or something similar however, there don't seem to be any that initiate a reboot.

I have also checked out the logs within the /var/log/sa directory. The resources on the machine don't appear to be heavily utilized prior to the reboot of the server. My goal is to determine why this machine keeps rebooting. Once I have determined the cause, I'd like to resolve the issue.

Once again I'm running RHEL6.1. The kernel is currently 2.6.32-131.17.1.el6.x86_64. Any assistance is greatly appreciated.

Thank you

TB0ne · 10-27-2011, 01:24 PM

Quote:

Originally Posted by eur0disciple

I have recently reinstalled a server that I administer. The server is now running RHEL6.1. It is our main file server/NAS, via NFS. It also runs a couple of minor services; DNS, DHCP, and NTP.

This server has rebooted on it's own several times since the reinstall, six times in the last week. I have checked the logs several times and have not found anything that would point to a cause of these reboots. The logs are sometimes lacking anywhere from five, to fifty, minutes worth of logs prior to the boot sequence. I initially thought cronjobs or something similar however, there don't seem to be any that initiate a reboot.

I have also checked out the logs within the /var/log/sa directory. The resources on the machine don't appear to be heavily utilized prior to the reboot of the server. My goal is to determine why this machine keeps rebooting. Once I have determined the cause, I'd like to resolve the issue.

Once again I'm running RHEL6.1. The kernel is currently 2.6.32-131.17.1.el6.x86_64. Any assistance is greatly appreciated.
Thank you

Not much we can help with, without information. If your logs don't show anything, have you considered also mirroring your logs to another syslog server, so if they aren't getting written locally, they might still get logged remotely (maybe).

And RedHat support is who you need to call, since you're paying for it, right? RHEL is a paid distro, and if you're not paying, you won't get bugfixes/updates that may have been released after the DVD was. Also, RHEL has diagnostic tools to help you...

eur0disciple · 10-27-2011, 03:13 PM

TB0ne, thanks for the response. The distro is actually Scientific Linux 6.1, I've just gotten into the habit of calling it RHEL.

Here are the logs that the server has actually rebooted. These logs are extremely useful to me because sometimes we don't even realize that the server has rebooted.

reboot system boot 2.6.32-131.17.1. Thu Oct 27 04:57 - 14:53 (09:56)
reboot system boot 2.6.32-131.17.1. Wed Oct 26 21:57 - 14:53 (16:56)
reboot system boot 2.6.32-131.17.1. Wed Oct 26 01:54 - 14:53 (1+12:59)
reboot system boot 2.6.32-131.17.1. Fri Oct 21 02:15 - 14:53 (6+12:37)

Below are the logs from /var/log/messages around the time of the reboot. Note: hostnames, IPs, and directories have been changed for anonymity.

Oct 26 21:02:27 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.47:1001 for /home/directory (/home/directory)
Oct 26 21:02:28 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.63:614 for /home/directory (/home/directory)
Oct 26 21:02:28 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.44:991 for /home/directory (/home/directory)
Oct 26 21:04:31 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.70:927 for /home/directory (/home/directory)
Oct 26 21:57:31 localhost kernel: imklog 4.6.2, log source = /proc/kmsg started.
Oct 26 21:57:31 localhost rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="2162" x-info="http://www.rsyslog.com"] (re)start
Oct 26 21:57:31 localhost kernel: Initializing cgroup subsys cpuset
Oct 26 21:57:31 localhost kernel: Initializing cgroup subsys cpu
Oct 26 21:57:31 localhost kernel: Linux version 2.6.32-131.17.1.el6.x86_64 (mockbuild@sl6.fnal.gov) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Oct 5 17:19:54 CDT 2011
Oct 26 21:57:31 localhost kernel: Command line: ro root=/dev/mapper/vg_localhost-lv_root rd_LVM_LV=vg_localhost/lv_root rd_LVM_LV=vg_localhost/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto rhgb quiet
Oct 26 21:57:31 localhost kernel: KERNEL supported cpus:

We have also suspected issues with samba. I have included entries from a few log files below.

[2011/10/26 21:57:35, 0] smbd/server.c:1119(main)
smbd version 3.5.6-86.el6_1.4 started.
Copyright Andrew Tridgell and the Samba Team 1992-2010
[2011/10/26 21:57:35.292532, 0] smbd/server.c:500(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Address already in use
[2011/10/26 21:57:35.292719, 0] smbd/server.c:500(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Address already in use

[2011/10/26 21:58:02.636457, 1] smbd/service.c:1070(make_connection_snum)
computer1 (::ffff:192.168.2.94) connect to service directory initially as user user1 (uid=604, gid=100) (pid 2996)

[2011/10/26 22:02:27.206031, 1] smbd/service.c:1070(make_connection_snum)
__ffff_192.168.1.35 (::ffff:192.168.1.35) connect to service user2 initially as user user2 (uid=502, gid=502) (pid 3446)

[2011/10/27 04:59:51.304297, 0] nmbd/nmbd_incomingdgrams.c:308(process_local_master_announce)
process_local_master_announce: Server COMPUTER2 at IP 192.168.1.51 is announcing itself as a local master browser for workgroup WORKGROUP and we think we are master. Forcing election.

eur0disciple · 10-27-2011, 09:04 PM

Got another reboot. This one was at 19:04 this evening.

last reboot | head -1
reboot system boot 2.6.32-131.17.1. Thu Oct 27 19:04 - 21:45 (02:41)

/var/log/messages
Oct 27 18:58:45 localhost smbd[13400]: [2011/10/27 18:58:45.706015, 0] lib/util_sock.c:474(read_fd_with_timeout)
Oct 27 18:58:45 localhost smbd[13400]: [2011/10/27 18:58:45.707376, 0] lib/util_sock.c:1441(get_peer_addr_internal)
Oct 27 18:58:45 localhost smbd[13400]: getpeername failed. Error was Transport endpoint is not connected
Oct 27 18:58:45 localhost smbd[13400]: read_fd_with_timeout: client 0.0.0.0 read error = No route to host.
Oct 27 19:04:25 localhost kernel: imklog 4.6.2, log source = /proc/kmsg started.
Oct 27 19:04:25 localhost rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="2155" x-info="http://www.rsyslog.com"] (re)start
Oct 27 19:04:25 localhost kernel: Initializing cgroup subsys cpuset
Oct 27 19:04:25 localhost kernel: Initializing cgroup subsys cpu
Oct 27 19:04:25 localhost kernel: Linux version 2.6.32-131.17.1.el6.x86_64 (mockbuild@sl6.fnal.gov) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Oct 5 17:19:54 CDT 2011

ausearch -ts 10/27/2011 18:45
----
time->Thu Oct 27 18:59:08 2011
type=CRYPTO_KEY_USER msg=audit(1319756348.722:1964): user pid=15283 uid=0 auid=604 ses=147 msg='op=destroy kind=server fp=f7:22:76:02:f6:f9:73:e7:29:15:49:8b
:fb:50:6d:0c direction=? spid=15283 suid=0 : exe="/usr/sbin/sshd" hostname=? addr=192.168.1.94 terminal=? res=success'----
time->Thu Oct 27 18:59:08 2011
type=CRYPTO_KEY_USER msg=audit(1319756348.722:1965): user pid=15283 uid=0 auid=604 ses=147 msg='op=destroy kind=server fp=bb:b7:29:30:01:5f:bb:cc:65:ad:bd:63
:ad:83:a8:47 direction=? spid=15283 suid=0 : exe="/usr/sbin/sshd" hostname=? addr=192.168.1.94 terminal=? res=success'----
time->Thu Oct 27 18:59:11 2011
type=CRYPTO_KEY_USER msg=audit(1319756351.211:1966): user pid=15382 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=f7:22:76:02:f6:f9:73:
e7:29:15:49:8b:fb:50:6d:0c direction=? spid=15382 suid=0 : exe="/usr/sbin/sshd" hostname=? addr=192.168.1.94 terminal=? res=success'----
time->Thu Oct 27 18:59:11 2011
type=CRYPTO_KEY_USER msg=audit(1319756351.212:1967): user pid=15382 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=bb:b7:29:30:01:5f:bb:
cc:65:ad:bd:63:ad:83:a8:47 direction=? spid=15382 suid=0 : exe="/usr/sbin/sshd" hostname=? addr=192.168.1.94 terminal=? res=success'----
time->Thu Oct 27 18:59:11 2011
type=CRYPTO_SESSION msg=audit(1319756351.212:1968): user pid=15381 uid=0 auid=4294967295 ses=4294967295 msg='op=start direction=from-client cipher=aes256-ctr
ksize=256 spid=15382 suid=74 rport=2230 laddr=10.1.212.1 lport=22 : exe="/usr/sbin/sshd" hostname=? addr=192.168.1.94 terminal=? res=success'----
time->Thu Oct 27 18:59:11 2011
type=CRYPTO_SESSION msg=audit(1319756351.212:1969): user pid=15381 uid=0 auid=4294967295 ses=4294967295 msg='op=start direction=from-server cipher=aes256-ctr
ksize=256 spid=15382 suid=74 rport=2230 laddr=10.1.212.1 lport=22 : exe="/usr/sbin/sshd" hostname=? addr=192.168.1.94 terminal=? res=success'----time->Thu Oct 27 19:00:01 2011
type=USER_ACCT msg=audit(1319756401.958:1970): user pid=15456 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="root" exe="/usr/sbin/crond" h
ostname=type=DAEMON_START msg=audit(1319756665.810:1294): auditd start, ver=2.1 format=raw kernel=2.6.32-131.17.1.el6.x86_64 auid=4294967295 pid=2130 res=suc
cess
----
time->Thu Oct 27 19:04:25 2011
type=CONFIG_CHANGE msg=audit(1319756665.917:3): audit_backlog_limit=320 old=64 auid=4294967295 ses=4294967295 res=1----
time->Thu Oct 27 19:04:54 2011
type=CRYPTO_KEY_USER msg=audit(1319756694.771:4): user pid=2987 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=f7:22:76:02:f6:f9:73:e7:29:15:49:8b:fb:50:6d:0c direction=? spid=2987 suid=0 : exe="/usr/sbin/sshd" hostname=? addr=192.168.1.6 terminal=? res=success'

eur0disciple · 10-27-2011, 09:24 PM

Here are some resource numbers from when the machine rebooted yesterday evening, 10/26, at 21:54.

00:00:01 CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle

21:50:01 all 0.01 0.00 1.71 9.42 0.00 0.00 1.41 0.00 87.45
21:50:01 0 0.00 0.00 0.77 16.00 0.00 0.00 0.48 0.00 82.74
21:50:01 1 0.01 0.00 0.62 8.40 0.00 0.00 0.23 0.00 90.74
21:50:01 2 0.01 0.00 0.43 4.83 0.00 0.00 0.15 0.00 94.58
21:50:01 3 0.00 0.00 1.04 16.80 0.00 0.00 2.60 0.00 79.57
21:50:01 4 0.01 0.00 3.49 14.79 0.00 0.00 2.82 0.00 78.89
21:50:01 5 0.01 0.00 4.50 14.33 0.00 0.00 9.65 0.00 71.51
21:50:01 6 0.00 0.00 3.28 12.21 0.00 0.00 1.81 0.00 82.70
21:50:01 7 0.00 0.00 2.79 9.10 0.00 0.00 1.50 0.00 86.60
21:50:01 8 0.00 0.00 0.37 8.68 0.00 0.00 0.09 0.00 90.86
21:50:01 9 0.02 0.00 0.40 6.12 0.00 0.00 0.17 0.00 93.29
21:50:01 10 0.00 0.00 0.30 3.09 0.00 0.00 0.15 0.00 96.46
21:50:01 11 0.00 0.00 0.83 7.15 0.00 0.00 0.32 0.00 91.70
21:50:01 12 0.01 0.00 1.87 9.42 0.00 0.00 0.36 0.00 88.34
21:50:01 13 0.11 0.00 4.25 10.51 0.00 0.00 1.57 0.00 83.55
21:50:01 14 0.01 0.00 1.51 6.05 0.00 0.00 0.47 0.00 91.95
21:50:01 15 0.01 0.00 0.88 3.16 0.00 0.00 0.14 0.00 95.81

02:00:01 proc/s cswch/s
21:50:01 1.37 25627.66

02:00:01 pswpin/s pswpout/s
21:50:01 0.00 0.00

02:00:01 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
21:50:01 54202.05 25058.41 219.77 0.05 50273.46 2235.39 16288.98 18522.75 99.99

02:00:01 tps rtps wtps bread/s bwrtn/s
21:50:01 829.34 523.49 305.85 108409.04 50124.80

02:00:01 frmpg/s bufpg/s campg/s
21:50:01 341.34 -0.06 -316.15

02:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
21:50:01 991416 23606996 95.97 18592 22145488 850836 1.65

02:00:01 kbswpfree kbswpused %swpused kbswpcad %swpcad
21:50:01 26834476 2508 0.01 332 13.24

02:00:01 dentunusd file-nr inode-nr pty-nr
21:50:01 10880 1536 17810 10

02:00:01 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
21:50:01 0 608 2.70 4.15 5.60

02:00:01 TTY rcvin/s xmtin/s framerr/s prtyerr/s brk/s ovrun/s
21:50:01 0 0.00 0.00 0.00 0.00 0.00 0.00
21:50:01 1 0.00 0.00 0.00 0.00 0.00 0.00

02:00:01 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util

21:50:01 dev8-0 0.82 4.11 6.04 12.44 0.00 1.63 0.90 0.07
21:50:01 dev8-16 822.10 108400.70 50084.60 192.78 15.25 18.56 0.68 55.65
21:50:01 dev8-32 5.54 0.04 27.92 5.05 0.00 0.26 0.24 0.13
21:50:01 dev253-0 0.88 4.11 6.04 11.48 0.00 1.68 0.83 0.07
21:50:01 dev253-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
21:50:01 dev253-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

02:00:01 IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
21:50:01 lo 0.65 0.65 0.22 0.22 0.00 0.00 0.00
21:50:01 eth0 17747.20 2523.25 25478.95 270.70 0.00 0.00 0.01
21:50:01 eth1 8.65 5.80 4.80 0.91 0.00 0.00 0.00
21:50:01 eth2 25.66 16.93 10.04 3.43 0.00 0.00 0.00
21:50:01 eth3 21411.18 13350.02 1677.12 55455.73 0.00 0.00 0.87

02:00:01 IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s

21:50:01 lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
21:50:01 eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
21:50:01 eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
21:50:01 eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
21:50:01 eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

02:00:01 call/s retrans/s read/s write/s access/s getatt/s
21:50:01 0.00 0.00 0.00 0.00 0.00 0.00

02:00:01 scall/s badcall/s packet/s udp/s tcp/s hit/s miss/s sread/s swrite/s saccess/s sgetatt/s
21:50:01 2763.53 0.00 2763.54 0.00 2763.28 0.00 872.76 1703.87 821.44 62.82 113.80

02:00:01 totsck tcpsck udpsck rawsck ip-frag tcp-tw
21:50:01 520 232 21 0 0 21

21:57:17 LINUX RESTART

stormtracknole · 10-28-2011, 08:46 AM

Hmmm...I wonder if you are having some sort of hardware failure? Do you have lm-sensors installed? You can run the command "sensors" to see your processor's temperature. Your box may be overheating, but normally, it would get shutdown rather than rebooted. Another thing to check is if selinux is enabled. If it is, temporarily disable it with this command as root:

Code:

setenforce 0

Let it run for a while after that to see if it keeps doing it.

eur0disciple · 10-28-2011, 10:08 AM

I'm leaning more towards a hardware issue as well. selinux is currently disabled.

SELinux status: disabled

I'm going to try and setup smb to use port 445 only, and setup remote logging, today. I checked and it does not appear that lm_sensors is currently installed. Thank you for the info. I'll see if that's something I can implement also.

Thanks again

eur0disciple · 10-28-2011, 01:18 PM

I have configured the machine for remote logging, over tcp, and tested. The remote logging is working properly.

*.* @@192.168.1.86

I would like to test the samba changes prior to making them on this machine.

stormtracknole · 10-28-2011, 01:44 PM

It sounds, at least to me, like it's a hardware failure. Are you using RAID by any chance? Hard drive failures usually lead to lock ups, but I'm not completely sure about that. You could also have some obscure hardware that your kernel doesn't like. We have a problem at work where one of the servers reboots for no reason every so often.

eur0disciple · 10-28-2011, 02:13 PM

We have a RAID setup on this box however it's a couple of RAID60 configurations. I believe we can handle nearly six drive failures prior to a catastrophe.

The server was not doing this prior to the reinstall. We went from running CentOS 5.7, to SL 6.1. We made this change so we could resolve the sixteen group limitation within AUTH_SYS/NFS.

reboot system boot 2.6.18-238.19.1. Thu Sep 1 05:32 - 15:13 (57+09:41)
reboot system boot 2.6.18-238.12.1. Mon Aug 1 05:32 - 05:30 (30+23:57)
reboot system boot 2.6.18-238.9.1.e Fri Jul 1 05:32 - 05:30 (30+23:57)
reboot system boot 2.6.18-238.9.1.e Thu Jun 2 09:11 - 05:30 (28+20:18)
reboot system boot 2.6.18-238.9.1.e Wed Jun 1 05:32 - 05:30 (29+23:57)
reboot system boot 2.6.18-238.9.1.e Tue May 31 12:41 - 05:30 (16:48)
reboot system boot 2.6.18-238.9.1.e Fri May 13 09:28 - 12:38 (18+03:09)
reboot system boot 2.6.18-238.9.1.e Fri May 13 08:02 - 09:12 (01:10)
reboot system boot 2.6.18-238.9.1.e Sun May 1 05:32 - 07:54 (12+02:22)

stormtracknole · 10-28-2011, 03:22 PM

I think there may be a kernel bug. Are you running 32 or 64 bit? Also, when you were running Centos5, which architecture where you running? I can't remember at the moment, but when using RAID, there is a feature that has to be either turned on or off on your BIOS. Not sure if this is related though. The problem that we running into is with RHEL5. Come to think of it, our box just crashes, not reboot. We have an additional RHEL5 with the same specs that doesn't do that. Go figure!

eur0disciple · 11-02-2011, 07:25 PM

The server hung up this evening and I had to reboot it. Here are some of the errors/logs from the reboot.

The console showed the following when I plugged in a keyboard and mouse:

usb 4-2: device descriptor read/64, error -71
usb 4-2: device not accepting address 5, error -71
hub 4-0:1.0: unable to enumerate USB device on port 2
This was duplicated several times

Logs are as follows:

Nov 2 19:57:22 localhost kernel: nfs: server 192.168.3.1 not responding, still trying
Nov 2 19:57:22 localhost kernel: nfs: server 192.168.3.1 not responding, still trying
Nov 2 19:59:53 localhost kernel: usb 5-1: new low speed USB device using uhci_hcd and address 2
Nov 2 19:59:53 localhost kernel: usb 5-1: New USB device found, idVendor=413c, idProduct=2107
Nov 2 19:59:53 localhost kernel: usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 2 19:59:53 localhost kernel: usb 5-1: Product: Dell USB Entry Keyboard
Nov 2 19:59:53 localhost kernel: usb 5-1: Manufacturer: Dell
Nov 2 19:59:53 localhost kernel: usb 5-1: configuration #1 chosen from 1 choice
Nov 2 19:59:53 localhost kernel: input: Dell Dell USB Entry Keyboard as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1:1.0/input/input9
Nov 2 19:59:53 localhost kernel: generic-usb 0003:413C:2107.0007: input,hidraw2: USB HID v1.10 Keyboard [Dell Dell USB Entry Keyboard] on usb-0000:00:1d.0-1/input0
Nov 2 20:00:54 localhost init: tty (/dev/tty1) main process (2802) killed by INT signal
Nov 2 20:00:54 localhost init: tty (/dev/tty1) main process ended, respawning
Nov 2 20:01:22 localhost kernel: nfs: server 192.168.3.1 not responding, still trying
Nov 2 20:01:52 localhost smbd[17811]: [2011/11/02 20:01:52.725783, 0] lib/util_sock.c:474(read_fd_with_timeout)
Nov 2 20:01:52 localhost smbd[17811]: [2011/11/02 20:01:52.725888, 0] lib/util_sock.c:1441(get_peer_addr_internal)
Nov 2 20:01:52 localhost smbd[17811]: getpeername failed. Error was Transport endpoint is not connected
Nov 2 20:01:52 localhost smbd[17811]: read_fd_with_timeout: client 0.0.0.0 read error = Connection reset by peer.
Nov 2 20:07:24 localhost kernel: imklog 4.6.2, log source = /proc/kmsg started.
Nov 2 20:07:24 localhost rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="2177" x-info="http://www.rsyslog.com"] (re)start
Nov 2 20:07:24 localhost kernel: Initializing cgroup subsys cpuset
Nov 2 20:07:24 localhost kernel: Initializing cgroup subsys cpu
Nov 2 20:07:24 localhost kernel: Linux version 2.6.32-131.17.1.el6.x86_64 (mockbuild@sl6.fnal.gov) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Oct 5 17:19:
54 CDT 2011

Right before I rebooted the machine, I was able to login as root but not any of the users. This leads me to believe it may be something wrong with the RAID as all of the users home directories reside on /dev/sdb1, and /dev/sdc1. I also couldn't access the /opt directory. The opt directory has a couple of symbolic links that point to directories on /dev/sdb1 also.

The vendor has requested that I upgrade the firmware, and the driver, on the RAID controller. I'll see if I can schedule that prior to the weekend. I'll be sure to let you know if it helps.

Thanks again

---------- Post added 11-02-11 at 08:25 PM ----------

Just thought I'd note that the 192.168.1.3 belongs to this server. It was having troubles serving NFS to all of it's clients.

eur0disciple · 11-03-2011, 08:32 PM

The RAID controller that we're using on this machine is a MegaRAID SAS 9280-4i4e. I have successfully upgraded the driver for the controller. We're now running version 00.00.05.40 of the driver.

I'm going to give this twenty four hours to settle. If all is well, I will upgrade the firmware.

Thank you

stormtracknole · 11-03-2011, 10:22 PM

Great to hear. Hopefully that will fix the issue. Let us know how it works out.

eur0disciple · 11-07-2011, 09:19 AM

Server has rebooted once again. This time it was at 8:20 on Saturday evening.

I'm going to upgrade the firmware on the controller and get back to the hardware vendor. This is completely random and nothing is erroneous within the logs.