Red HatThis forum is for the discussion of Red Hat Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have recently reinstalled a server that I administer. The server is now running RHEL6.1. It is our main file server/NAS, via NFS. It also runs a couple of minor services; DNS, DHCP, and NTP.
This server has rebooted on it's own several times since the reinstall, six times in the last week. I have checked the logs several times and have not found anything that would point to a cause of these reboots. The logs are sometimes lacking anywhere from five, to fifty, minutes worth of logs prior to the boot sequence. I initially thought cronjobs or something similar however, there don't seem to be any that initiate a reboot.
I have also checked out the logs within the /var/log/sa directory. The resources on the machine don't appear to be heavily utilized prior to the reboot of the server. My goal is to determine why this machine keeps rebooting. Once I have determined the cause, I'd like to resolve the issue.
Once again I'm running RHEL6.1. The kernel is currently 2.6.32-131.17.1.el6.x86_64. Any assistance is greatly appreciated.
Thank you
Last edited by eur0disciple; 10-27-2011 at 11:41 AM.
Reason: Additional details
I have recently reinstalled a server that I administer. The server is now running RHEL6.1. It is our main file server/NAS, via NFS. It also runs a couple of minor services; DNS, DHCP, and NTP.
This server has rebooted on it's own several times since the reinstall, six times in the last week. I have checked the logs several times and have not found anything that would point to a cause of these reboots. The logs are sometimes lacking anywhere from five, to fifty, minutes worth of logs prior to the boot sequence. I initially thought cronjobs or something similar however, there don't seem to be any that initiate a reboot.
I have also checked out the logs within the /var/log/sa directory. The resources on the machine don't appear to be heavily utilized prior to the reboot of the server. My goal is to determine why this machine keeps rebooting. Once I have determined the cause, I'd like to resolve the issue.
Once again I'm running RHEL6.1. The kernel is currently 2.6.32-131.17.1.el6.x86_64. Any assistance is greatly appreciated.
Thank you
Not much we can help with, without information. If your logs don't show anything, have you considered also mirroring your logs to another syslog server, so if they aren't getting written locally, they might still get logged remotely (maybe).
And RedHat support is who you need to call, since you're paying for it, right? RHEL is a paid distro, and if you're not paying, you won't get bugfixes/updates that may have been released after the DVD was. Also, RHEL has diagnostic tools to help you...
TB0ne, thanks for the response. The distro is actually Scientific Linux 6.1, I've just gotten into the habit of calling it RHEL.
Here are the logs that the server has actually rebooted. These logs are extremely useful to me because sometimes we don't even realize that the server has rebooted.
reboot system boot 2.6.32-131.17.1. Thu Oct 27 04:57 - 14:53 (09:56)
reboot system boot 2.6.32-131.17.1. Wed Oct 26 21:57 - 14:53 (16:56)
reboot system boot 2.6.32-131.17.1. Wed Oct 26 01:54 - 14:53 (1+12:59)
reboot system boot 2.6.32-131.17.1. Fri Oct 21 02:15 - 14:53 (6+12:37)
Below are the logs from /var/log/messages around the time of the reboot. Note: hostnames, IPs, and directories have been changed for anonymity.
Oct 26 21:02:27 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.47:1001 for /home/directory (/home/directory)
Oct 26 21:02:28 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.63:614 for /home/directory (/home/directory)
Oct 26 21:02:28 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.44:991 for /home/directory (/home/directory)
Oct 26 21:04:31 localhost rpc.mountd[2591]: authenticated mount request from 192.168.1.70:927 for /home/directory (/home/directory)
Oct 26 21:57:31 localhost kernel: imklog 4.6.2, log source = /proc/kmsg started.
Oct 26 21:57:31 localhost rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="2162" x-info="http://www.rsyslog.com"] (re)start
Oct 26 21:57:31 localhost kernel: Initializing cgroup subsys cpuset
Oct 26 21:57:31 localhost kernel: Initializing cgroup subsys cpu
Oct 26 21:57:31 localhost kernel: Linux version 2.6.32-131.17.1.el6.x86_64 (mockbuild@sl6.fnal.gov) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Oct 5 17:19:54 CDT 2011
Oct 26 21:57:31 localhost kernel: Command line: ro root=/dev/mapper/vg_localhost-lv_root rd_LVM_LV=vg_localhost/lv_root rd_LVM_LV=vg_localhost/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto rhgb quiet
Oct 26 21:57:31 localhost kernel: KERNEL supported cpus:
We have also suspected issues with samba. I have included entries from a few log files below.
[2011/10/26 21:57:35, 0] smbd/server.c:1119(main)
smbd version 3.5.6-86.el6_1.4 started.
Copyright Andrew Tridgell and the Samba Team 1992-2010
[2011/10/26 21:57:35.292532, 0] smbd/server.c:500(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Address already in use
[2011/10/26 21:57:35.292719, 0] smbd/server.c:500(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Address already in use
[2011/10/26 21:58:02.636457, 1] smbd/service.c:1070(make_connection_snum)
computer1 (::ffff:192.168.2.94) connect to service directory initially as user user1 (uid=604, gid=100) (pid 2996)
[2011/10/26 22:02:27.206031, 1] smbd/service.c:1070(make_connection_snum)
__ffff_192.168.1.35 (::ffff:192.168.1.35) connect to service user2 initially as user user2 (uid=502, gid=502) (pid 3446)
[2011/10/27 04:59:51.304297, 0] nmbd/nmbd_incomingdgrams.c:308(process_local_master_announce)
process_local_master_announce: Server COMPUTER2 at IP 192.168.1.51 is announcing itself as a local master browser for workgroup WORKGROUP and we think we are master. Forcing election.
Hmmm...I wonder if you are having some sort of hardware failure? Do you have lm-sensors installed? You can run the command "sensors" to see your processor's temperature. Your box may be overheating, but normally, it would get shutdown rather than rebooted. Another thing to check is if selinux is enabled. If it is, temporarily disable it with this command as root:
Code:
setenforce 0
Let it run for a while after that to see if it keeps doing it.
I'm leaning more towards a hardware issue as well. selinux is currently disabled.
SELinux status: disabled
I'm going to try and setup smb to use port 445 only, and setup remote logging, today. I checked and it does not appear that lm_sensors is currently installed. Thank you for the info. I'll see if that's something I can implement also.
It sounds, at least to me, like it's a hardware failure. Are you using RAID by any chance? Hard drive failures usually lead to lock ups, but I'm not completely sure about that. You could also have some obscure hardware that your kernel doesn't like. We have a problem at work where one of the servers reboots for no reason every so often.
We have a RAID setup on this box however it's a couple of RAID60 configurations. I believe we can handle nearly six drive failures prior to a catastrophe.
The server was not doing this prior to the reinstall. We went from running CentOS 5.7, to SL 6.1. We made this change so we could resolve the sixteen group limitation within AUTH_SYS/NFS.
reboot system boot 2.6.18-238.19.1. Thu Sep 1 05:32 - 15:13 (57+09:41)
reboot system boot 2.6.18-238.12.1. Mon Aug 1 05:32 - 05:30 (30+23:57)
reboot system boot 2.6.18-238.9.1.e Fri Jul 1 05:32 - 05:30 (30+23:57)
reboot system boot 2.6.18-238.9.1.e Thu Jun 2 09:11 - 05:30 (28+20:18)
reboot system boot 2.6.18-238.9.1.e Wed Jun 1 05:32 - 05:30 (29+23:57)
reboot system boot 2.6.18-238.9.1.e Tue May 31 12:41 - 05:30 (16:48)
reboot system boot 2.6.18-238.9.1.e Fri May 13 09:28 - 12:38 (18+03:09)
reboot system boot 2.6.18-238.9.1.e Fri May 13 08:02 - 09:12 (01:10)
reboot system boot 2.6.18-238.9.1.e Sun May 1 05:32 - 07:54 (12+02:22)
Last edited by eur0disciple; 10-28-2011 at 02:34 PM.
I think there may be a kernel bug. Are you running 32 or 64 bit? Also, when you were running Centos5, which architecture where you running? I can't remember at the moment, but when using RAID, there is a feature that has to be either turned on or off on your BIOS. Not sure if this is related though. The problem that we running into is with RHEL5. Come to think of it, our box just crashes, not reboot. We have an additional RHEL5 with the same specs that doesn't do that. Go figure!
The server hung up this evening and I had to reboot it. Here are some of the errors/logs from the reboot.
The console showed the following when I plugged in a keyboard and mouse:
usb 4-2: device descriptor read/64, error -71
usb 4-2: device not accepting address 5, error -71
hub 4-0:1.0: unable to enumerate USB device on port 2
This was duplicated several times
Logs are as follows:
Nov 2 19:57:22 localhost kernel: nfs: server 192.168.3.1 not responding, still trying
Nov 2 19:57:22 localhost kernel: nfs: server 192.168.3.1 not responding, still trying
Nov 2 19:59:53 localhost kernel: usb 5-1: new low speed USB device using uhci_hcd and address 2
Nov 2 19:59:53 localhost kernel: usb 5-1: New USB device found, idVendor=413c, idProduct=2107
Nov 2 19:59:53 localhost kernel: usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 2 19:59:53 localhost kernel: usb 5-1: Product: Dell USB Entry Keyboard
Nov 2 19:59:53 localhost kernel: usb 5-1: Manufacturer: Dell
Nov 2 19:59:53 localhost kernel: usb 5-1: configuration #1 chosen from 1 choice
Nov 2 19:59:53 localhost kernel: input: Dell Dell USB Entry Keyboard as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1:1.0/input/input9
Nov 2 19:59:53 localhost kernel: generic-usb 0003:413C:2107.0007: input,hidraw2: USB HID v1.10 Keyboard [Dell Dell USB Entry Keyboard] on usb-0000:00:1d.0-1/input0
Nov 2 20:00:54 localhost init: tty (/dev/tty1) main process (2802) killed by INT signal
Nov 2 20:00:54 localhost init: tty (/dev/tty1) main process ended, respawning
Nov 2 20:01:22 localhost kernel: nfs: server 192.168.3.1 not responding, still trying
Nov 2 20:01:52 localhost smbd[17811]: [2011/11/02 20:01:52.725783, 0] lib/util_sock.c:474(read_fd_with_timeout)
Nov 2 20:01:52 localhost smbd[17811]: [2011/11/02 20:01:52.725888, 0] lib/util_sock.c:1441(get_peer_addr_internal)
Nov 2 20:01:52 localhost smbd[17811]: getpeername failed. Error was Transport endpoint is not connected
Nov 2 20:01:52 localhost smbd[17811]: read_fd_with_timeout: client 0.0.0.0 read error = Connection reset by peer.
Nov 2 20:07:24 localhost kernel: imklog 4.6.2, log source = /proc/kmsg started.
Nov 2 20:07:24 localhost rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="2177" x-info="http://www.rsyslog.com"] (re)start
Nov 2 20:07:24 localhost kernel: Initializing cgroup subsys cpuset
Nov 2 20:07:24 localhost kernel: Initializing cgroup subsys cpu
Nov 2 20:07:24 localhost kernel: Linux version 2.6.32-131.17.1.el6.x86_64 (mockbuild@sl6.fnal.gov) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Oct 5 17:19:
54 CDT 2011
Right before I rebooted the machine, I was able to login as root but not any of the users. This leads me to believe it may be something wrong with the RAID as all of the users home directories reside on /dev/sdb1, and /dev/sdc1. I also couldn't access the /opt directory. The opt directory has a couple of symbolic links that point to directories on /dev/sdb1 also.
The vendor has requested that I upgrade the firmware, and the driver, on the RAID controller. I'll see if I can schedule that prior to the weekend. I'll be sure to let you know if it helps.
Thanks again
---------- Post added 11-02-11 at 08:25 PM ----------
Just thought I'd note that the 192.168.1.3 belongs to this server. It was having troubles serving NFS to all of it's clients.
The RAID controller that we're using on this machine is a MegaRAID SAS 9280-4i4e. I have successfully upgraded the driver for the controller. We're now running version 00.00.05.40 of the driver.
I'm going to give this twenty four hours to settle. If all is well, I will upgrade the firmware.
Server has rebooted once again. This time it was at 8:20 on Saturday evening.
I'm going to upgrade the firmware on the controller and get back to the hardware vendor. This is completely random and nothing is erroneous within the logs.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.