LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-21-2008, 10:29 PM   #1
n9066r
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Rep: Reputation: 0
Random Reboots


I have a pair of Proliant 5500s running Cent OS 5.2 that both randomly reboot. APIC as well as ASR are off. I'm thinking it has to be a software problem as opposed to a hardware problem since both are experiencing the same problem. Also, one is a mail/web server running Apache and Zimbra and the other is a Samba server.

Dec 21 06:06:48 mail -- MARK --
Dec 21 06:26:48 mail -- MARK --
Dec 21 06:46:48 mail -- MARK --
Dec 21 07:06:48 mail -- MARK --
Dec 21 07:26:48 mail -- MARK --
Dec 21 07:46:48 mail -- MARK --
Dec4.1: restart.
Dec 21 15:05:56 mail audispd: af_unix plugin initialized
Dec 21 15:05:56 mail audispd: audispd initialized with q_depth=64 and 1 active plugins
Dec 21 15:05:56 mail kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 21 15:05:56 mail kernel: Linux version 2.6.18-92.1.18.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Wed Nov 12 09:30:27 EST 2008
Dec 21 15:05:56 mail kernel: BIOS-provided physical RAM map:
Dec 21 15:05:56 mail kernel: BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
Dec 21 15:05:56 mail kernel: BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 0000000000100000 - 000000009fffc000 (usable)
Dec 21 15:05:56 mail kernel: BIOS-e820: 000000009fffc000 - 00000000a0000000 (ACPI data)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
Dec 21 15:05:56 mail kernel: 1663MB HIGHMEM available.
Dec 21 15:05:56 mail kernel: 896MB LOWMEM available.
Dec 21 15:05:56 mail kernel: found SMP MP-table at 000f4fd0
 
Old 12-22-2008, 05:11 PM   #2
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora
Posts: 3,935
Blog Entries: 5

Rep: Reputation: Disabled
Weird. You have 'mark' timestamps showing up regularly, then they stop, then hours later the server restarts. I wonder if something is killing syslogd??

Given that your logs aren't telling you much, do you have some spare RAM that you could trade out with on one of the two servers? (I think this is worth exploring on the chance that your RAM + motherboard combination are just not playing nice.)
 
Old 12-22-2008, 05:29 PM   #3
rweaver
Senior Member
 
Registered: Dec 2008
Location: Louisville, OH
Distribution: Debian, CentOS, Slackware, RHEL, Gentoo
Posts: 1,833

Rep: Reputation: 167Reputation: 167
Quote:
Originally Posted by n9066r View Post
I have a pair of Proliant 5500s running Cent OS 5.2 that both randomly reboot. APIC as well as ASR are off. I'm thinking it has to be a software problem as opposed to a hardware problem since both are experiencing the same problem. Also, one is a mail/web server running Apache and Zimbra and the other ...<SNIP>... 1663MB HIGHMEM available.
Dec 21 15:05:56 mail kernel: 896MB LOWMEM available.
Dec 21 15:05:56 mail kernel: found SMP MP-table at 000f4fd0
Run rootkit hunter and check rootkit on the system. That looks strange and having two servers doing it may be a coincidence, but its highly doubtful.
 
Old 12-23-2008, 09:43 AM   #4
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
What type of cron jobs do you have running? Do you have any backups running? Anything in particular running you know of between the time stamps of MARK above and the reboot time stamps? I would imagine some process is killing these machines, preventing the logging or the like.
 
Old 12-23-2008, 10:39 AM   #5
n9066r
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Here is what I have running:

root Yes /etc/cron.daily/makewhatis.cron
/etc/cron.daily/rpm
/etc/cron.daily/mlocate.cron
/etc/cron.daily/0logwatch
/etc/cron.daily/prelink
/etc/cron.daily/0anacron
/etc/cron.daily/tmpwatch
root Yes /etc/cron.weekly/makewhatis.cron
/etc/cron.weekly/logrotate
/etc/cron.weekly/0anacron
root Yes /etc/cron.monthly/0anacron
root Yes /etc/webmin/cron/tempdelete.pl
zimbra Yes find /opt/zimbra/log/ -type f -name \*.log\* -mtime +8 -exec rm {} \; > /dev/nul ...
zimbra Yes find /opt/zimbra/log/ -type f -name \*.out.???????????? -mtime +8 -exec rm {} \; ...
zimbra Yes /opt/zimbra/libexec/zmstatuslog
zimbra Yes /opt/zimbra/libexec/zmdisklog
zimbra Yes find /opt/zimbra/mailboxd/logs/ -type f -name \*log\* -mtime +8 -exec rm {} \; > ...
zimbra Yes /opt/zimbra/libexec/zmmaintaintables >> /dev/null 2>&1
zimbra Yes /opt/zimbra/libexec/zmdbintegrityreport -m
zimbra Yes /opt/zimbra/libexec/zmcheckduplicatemysqld -e > /dev/null 2>&1
zimbra Yes /opt/zimbra/libexec/zmlogprocess > /tmp/logprocess.out 2>&1
zimbra Yes /opt/zimbra/libexec/zmgengraphs >> /tmp/gengraphs.out 2>&1
zimbra Yes /opt/zimbra/libexec/zmdailyreport -m
zimbra Yes /opt/zimbra/libexec/zmqueuelog
zimbra Yes /opt/zimbra/bin/zmtrainsa >> /opt/zimbra/log/spamtrain.log 2>&1
zimbra Yes /opt/zimbra/bin/zmtrainsa --cleanup >> /opt/zimbra/log/spamtrain.log 2>&1
zimbra No find /opt/zimbra/dspam/var/dspam/data/z/i/zimbra/zimbra.sig/ -type f -name \*sig ...
zimbra No /opt/zimbra/dspam/bin/dspam_logrotate -a 60 /opt/zimbra/dspam/var/dspam/system.l ...
zimbra No /opt/zimbra/dspam/bin/dspam_logrotate -a 60 /opt/zimbra/dspam/var/dspam/data/z/i ...
zimbra Yes /opt/zimbra/libexec/sa-learn -p /opt/zimbra/conf/salocal.cf --dbpath /opt/zimbra ...
zimbra Yes find /opt/zimbra/data/amavisd/tmp -maxdepth 1 -type d -name 'amavis-*' -mtime +1 ...
zimbra Yes find /opt/zimbra/data/amavisd/quarantine -type f -mtime +7 -exec rm -f {} \; > / ...
 
Old 12-23-2008, 10:45 AM   #6
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
What time are those zimbra crons running? Anything close to the time the machine is getting rebooted? One of those could be the culprit.
 
Old 12-23-2008, 05:14 PM   #7
momolin
LQ Newbie
 
Registered: Dec 2008
Location: Taiwan
Distribution: CentOS
Posts: 1

Rep: Reputation: 0
The servers both rebooted at the same time, I think you can check the power.
 
Old 12-23-2008, 05:51 PM   #8
n9066r
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by rweaver View Post
Run rootkit hunter and check rootkit on the system. That looks strange and having two servers doing it may be a coincidence, but its highly doubtful.
I'll run them and see what I come up with.

Thanks
 
Old 12-23-2008, 06:02 PM   #9
n9066r
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by trickykid View Post
What time are those zimbra crons running? Anything close to the time the machine is getting rebooted? One of those could be the culprit.
None of them are close to the reboot times and the reboot is completely random. Load also seems to be no factor as it is just as likely to reboot running off hours as business hours.
 
Old 12-24-2008, 08:50 AM   #10
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by momolin View Post
The servers both rebooted at the same time, I think you can check the power.
I'd have to agree here as well. If it's not a cron, it's off business hours and both servers rebooted almost at the exact time, I'd check power as well.
 
Old 12-24-2008, 09:35 AM   #11
n9066r
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
I checked for rootkits last night and both servers were clean. The servers don't reboot at the same time but there could still be an issue with the UPS.

I tried:

kernel /vmlinuz-2.6.18-92.1.18.el5 ro root=/dev/VolGroup00/LogVol00 debug apm=off acpi=off ide=nodma nousb nopsmcia noapic nofb

and so far the server has been up 23 hours which it hasn't done before. Hopefully this will be the answer.
 
Old 12-24-2008, 09:55 AM   #12
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by n9066r View Post
I checked for rootkits last night and both servers were clean. The servers don't reboot at the same time but there could still be an issue with the UPS.
Never rule it out of the equation.

Do these servers have dual power supplies? If so, do you have more than one UPS? If there's more than one, you should split up the power and or try bypassing the UPS to see if the problem reoccurs. Also, most UPS's have a management port, you could probably setup to monitor these to see if it's the actual culprit.
 
Old 12-24-2008, 10:05 AM   #13
n9066r
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
They do have dual power supplies. I have another UPS at home that I'll bring and try next week. The UPS does have a management port so I'll set it up and see what it shows.
 
Old 12-24-2008, 10:48 AM   #14
alexhwest
Member
 
Registered: Dec 2008
Location: Cleveland, OH
Distribution: Ubuntu
Posts: 30

Rep: Reputation: 15
Probably not relevant to the problem, but you used nopsmcia where it should be nopcmcia.
 
Old 12-24-2008, 10:57 AM   #15
n9066r
LQ Newbie
 
Registered: Jul 2005
Posts: 11

Original Poster
Rep: Reputation: 0
Thanks for catching the typo. I'll change it in grub.conf
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Random Reboots - Slackware 10.2 tsg Linux - General 1 03-18-2008 11:49 AM
random reboots - HP DL320 G4 tgv1968 Linux - Newbie 2 08-18-2006 08:20 AM
OpenSUSE 10.0 Random Reboots [42]Sanf0rd SUSE / openSUSE 0 06-09-2006 10:04 PM
random reboots rclawson Mandriva 3 10-26-2003 08:09 AM
Random Reboots Kernel_Sanders Linux - Hardware 2 07-08-2003 04:13 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:11 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration