Linux server goes to shutdown each night

lema · 10-20-2009, 12:32 PM

Hi All,

I have HP Prolant320 server running with SUSE10 SP1. This is 3 years old server. It ran perfectly until last week. It started to shutdown each night. Is there any way to find out why its happening?

Thanks,
Vladimir

TB0ne · 10-20-2009, 01:03 PM

Quote:

Originally Posted by lema

Hi All,

I have HP Prolant320 server running with SUSE10 SP1. This is 3 years old server. It ran perfectly until last week. It started to shutdown each night. Is there any way to find out why its happening?

Thanks,
Vladimir

Look in the system logs (usually /var/log/messages), and see if you see anything. Other things to check are:

- Does it happen same time each night?
- Any software/scripts/cron-jobs get updated recently?
- Any hardware-error/info lights on?
- Any new users that don't have a clue get hired recently?

lema · 10-20-2009, 01:19 PM

Quote:

Originally Posted by TB0ne

Look in the system logs (usually /var/log/messages), and see if you see anything. Other things to check are:

TB0ne, thanks for reply! I checked the /var/log/messages. Here are last messages before shutdown

Oct 20 07:32:38 chicago syslog-ng[2494]: STATS: dropped 0
Oct 20 07:52:13 chicago su: (to root) root on none
Oct 20 07:52:13 chicago su: (to root) root on none
Oct 20 07:52:13 chicago shutdown[8128]: shutting down for system halt
Oct 20 07:52:13 chicago init: Switching to runlevel: 0
Oct 20 07:52:14 chicago auditd[3221]: The audit daemon is exiting.
Oct 20 07:52:14 chicago kernel: audit(1256050334.959:8): audit_pid=0 old=3221 by auid=4294967295
Oct 20 07:52:15 chicago xinetd[3694]: Exiting...
Oct 20 07:52:15 chicago sshd[3592]: Received signal 15; terminating.
Oct 20 07:52:15 chicago nmbd[3362]: [2009/10/20 07:52:15, 0] nmbd/nmbd.c:terminate(58)
Oct 20 07:52:15 chicago nmbd[3362]: Got SIGTERM: going down...
Oct 20 07:52:20 chicago kernel: nfsd: last server has exited
Oct 20 07:52:20 chicago kernel: nfsd: unexporting all filesystems
Oct 20 07:52:20 chicago rpc.mountd: Caught signal 15, un-registering and exiting.
Oct 20 07:52:32 chicago kernel: Kernel logging (proc) stopped.
Oct 20 07:52:32 chicago kernel: Kernel log daemon terminating.
Oct 20 07:52:33 chicago syslog-ng[2494]: syslog-ng version 1.6.8 going down

Is this indicating that somebody did it manually using root password?

TB0ne · 10-20-2009, 03:39 PM

Quote:

Originally Posted by lema

TB0ne, thanks for reply! I checked the /var/log/messages. Here are last messages before shutdown

Oct 20 07:32:38 chicago syslog-ng[2494]: STATS: dropped 0
Oct 20 07:52:13 chicago su: (to root) root on none
Oct 20 07:52:13 chicago su: (to root) root on none
Oct 20 07:52:13 chicago shutdown[8128]: shutting down for system halt
Oct 20 07:52:13 chicago init: Switching to runlevel: 0
Oct 20 07:52:14 chicago auditd[3221]: The audit daemon is exiting.
Oct 20 07:52:14 chicago kernel: audit(1256050334.959:8): audit_pid=0 old=3221 by auid=4294967295
Oct 20 07:52:15 chicago xinetd[3694]: Exiting...
Oct 20 07:52:15 chicago sshd[3592]: Received signal 15; terminating.
Oct 20 07:52:15 chicago nmbd[3362]: [2009/10/20 07:52:15, 0] nmbd/nmbd.c:terminate(58)
Oct 20 07:52:15 chicago nmbd[3362]: Got SIGTERM: going down...
Oct 20 07:52:20 chicago kernel: nfsd: last server has exited
Oct 20 07:52:20 chicago kernel: nfsd: unexporting all filesystems
Oct 20 07:52:20 chicago rpc.mountd: Caught signal 15, un-registering and exiting.
Oct 20 07:52:32 chicago kernel: Kernel logging (proc) stopped.
Oct 20 07:52:32 chicago kernel: Kernel log daemon terminating.
Oct 20 07:52:33 chicago syslog-ng[2494]: syslog-ng version 1.6.8 going down

Is this indicating that somebody did it manually using root password?

It looks that way, but it could also be coming from a CRON job, too, or from some other scheduled process.

Another thing to consider, is if it's plugged into a UPS. If you're running a UPS daemon, the UPS could be sending out bogus data, causing the server to shut down that way.

catkin · 10-21-2009, 06:16 AM

Quote:

Originally Posted by TB0ne

It looks that way, but it could also be coming from a CRON job, too, or from some other scheduled process.

Another thing to consider, is if it's plugged into a UPS. If you're running a UPS daemon, the UPS could be sending out bogus data, causing the server to shut down that way.

My hunch, for two reasons, is it is automated. Firstly there is no controlling terminal for the su [The log shows su: (to root) root on none] and secondly the shutdown command was entered less that one second after authentication -- just about long enough to see a command prompt come up, type the command and press Enter but doesn't allow for any hesitation or fumble factor (few sysadmins are great typists). Scheduled or UPS are good contenders for "automated". Does it shut down the same time each night? Is the time in the logs local time? Sun-up?

lema · 10-22-2009, 11:32 AM

no its shuts down in different time but always between 5-7 AM.
could it be some hardware failure which forces server to shut down?

Thanks,
Vladimir

thegeek · 10-22-2009, 12:17 PM

Look for a crontab with the number seven in it

grep 7 /var/spool/cron/*

TB0ne · 10-22-2009, 08:45 PM

Quote:

Originally Posted by lema

no its shuts down in different time but always between 5-7 AM.
could it be some hardware failure which forces server to shut down?

Thanks,
Vladimir

Since it's between the same hours each day, but with such a wide window, I'd bet someone put something at the end of another script. Since that script processes, and doesn't finish the same time every day, you have a variable shutdown. Check ALL your cron entries, and not just for root either. Any user that has SUDO rights could shut down the box, so check user CRON's too.

And another thought...do you have automatic updates turned on? If so, the server may be trying to apply a patch that needs a reboot, and is taking a 'default' value from somewhere....

thegeek · 10-23-2009, 01:35 AM

In fact just look at /var/log/cron and see what gets run ...

arizonagroovejet · 10-23-2009, 09:58 AM

Quote:

Originally Posted by TB0ne

And another thought...do you have automatic updates turned on?

If "SUSE10 SP1" means, as I suspect, SUSE Linux Enterprise Server 10 with Service Pack 1 then I'm thinking then Novell stopped issuing updates for it about a year ago after telling everyone they needed to update to Service Pack 2 (which has was obsoleted last week by Service Pack 3).

lema · 10-23-2009, 01:31 PM

Thank you for replies!
Unfortunately the problem is still exists

I checked the /var/spool/cron/tabs and I could see only my scheduled job for ClearCase backup. BTW, that's why I cannot upgrade to SP2. Its not supported by IBM.

I'll change the root password and see how it goes.

Best Regards,
Vladimir

TB0ne · 10-23-2009, 01:38 PM

Quote:

Originally Posted by lema

Thank you for replies!
Unfortunately the problem is still exists

I checked the /var/spool/cron/tabs and I could see only my scheduled job for ClearCase backup. BTW, that's why I cannot upgrade to SP2. Its not supported by IBM.

I'll change the root password and see how it goes.

Best Regards,
Vladimir

I'd also change the SUDO'ers file, too, to remove everyone from it but you for a night or two. But given the log evidence, I'd agree and say it's a scheduled job. Changing the password won't help...you've got to identify the job, and remove the entry.