LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   How to Troubleshoot Sudden Server Restarts (http://www.linuxquestions.org/questions/linux-server-73/how-to-troubleshoot-sudden-server-restarts-943893/)

thund3rstruck 05-08-2012 10:00 AM

How to Troubleshoot Sudden Server Restarts
 
Hi guys,

Hoping someone can help me troubleshoot a server that keeps restarting suddenly. Its a Ubuntu SMP 3.0.0-12-server x86_64 GNU/Linux server which hosts Samba services for multiple remote IIS web applications.

Users have been reporting that services are failing throughout the day and a quick system check seems to validate that the server is indeed restarting.

Code:

developer@knoxfactoryfs011:/var/log$ last reboot|head -1
reboot  system boot  3.0.0-12-server  Tue May  8 09:20 - 09:42  (00:21)

Code:

developer@knoxfactoryfs011:/var/log$ who -b
        system boot  2012-05-08 09:20

Doesn't seem to be much out of the ordinary in /var/log/syslog or dmesg, no messages indicating a signalled reboot. It appears to be suddenly shutting down and restarting. The server is connected to a UPS power supply backup/surge protector so I don't think its electrical.

Can anyone provide clues as to how I might troubleshoot something like this?

sanjay87 05-08-2012 10:26 AM

Hi
Did u check out these log

# tail -f /var/log/wtmp
# tail -f /var/log/message ----- u can find the cause of reboot over these log

check whether u had enabled any crontab schedule for reboot .

thund3rstruck 05-08-2012 11:04 AM

Quote:

Originally Posted by sanjay87 (Post 4673439)
Hi
Did u check out these log

# tail -f /var/log/wtmp
# tail -f /var/log/message ----- u can find the cause of reboot over these log

check whether u had enabled any crontab schedule for reboot .

Ubuntu doesn't have a messages log, it uses syslog.

There doesn't seem to be any message what so ever of an initialized shutdown -r in syslog but you can definitely see where the machine started (just not shutdown).

syslog
Code:

May  8 07:17:01 knoxfactoryfs011 CRON[10746]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May  8 08:17:01 knoxfactoryfs011 CRON[10761]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May  8 09:17:01 knoxfactoryfs011 CRON[10786]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May  8 09:20:36 knoxfactoryfs011 kernel: imklog 5.8.1, log source = /proc/kmsg started.
May  8 09:20:36 knoxfactoryfs011 rsyslogd: [origin software="rsyslogd" swVersion="5.8.1" x-pid="767" x-info="http://www.rsyslog.com"] start
May  8 09:20:36 knoxfactoryfs011 rsyslogd: rsyslogd's groupid changed to 103
May  8 09:20:36 knoxfactoryfs011 rsyslogd: rsyslogd's userid changed to 101
May  8 09:20:36 knoxfactoryfs011 rsyslogd-2039: Could no open output pipe '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ]
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Initializing cgroup subsys cpuset
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Initializing cgroup subsys cpu
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Linux version 3.0.0-12-server (buildd@crested) (gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) ) #20-Ubuntu
SMP Fri Oct 7 16:36:30 UTC 2011 (Ubuntu 3.0.0-12.20-server 3.0.4)
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Command line: BOOT_IMAGE=/boot...
...

There is an interesting message in kern.log that the BIOS is broken but this message is on startup, not shutdown.

kern.log
Code:

May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Calgary: Unable to locate RioGrande table in EBDA - bailing!
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] ------------[ cut here ]------------
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] WARNING: at /build/buildd/linux-3.0.0/drivers/pci/dmar.c:634 warn_invalid_dmar+0x8f/0xa0()
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Hardware name: OptiPlex 760               
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Your BIOS is broken; DMAR reported at address fedc1000 returns all ones!
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] BIOS vendor: Dell Inc.; Ver: A03; Product Version:
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Modules linked in:
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Pid: 0, comm: swapper Not tainted 3.0.0-12-server #20-Ubuntu
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000] Call Trace:
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000]  [<ffffffff8105e81f>] warn_slowpath_common+0x7f/0xc0
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000]  [<ffffffff8105e8bf>] warn_slowpath_fmt_taint+0x3f/0x50
May  8 09:20:36 knoxfactoryfs011 kernel: [0.000000]  [<ffffffff81cfd216>] ? __early_set_fixmap+0x96/0x9d
...

Trying to get at what is causing this server to suddenly shutdown and reboot on its own... and its very frustrating!

mpapet 05-08-2012 12:51 PM

If it were the OS, you would have error messages logged.

It's a hardware problem. Hopefully the server vendor has good warranty support because you will need it to isolate the offending component. (power supply interface, mainboard issue, powersupply itself)


All times are GMT -5. The time now is 06:59 PM.