LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 09-04-2012, 11:41 AM   #1
felbvts
Member
 
Registered: Mar 2009
Posts: 37

Rep: Reputation: 3
RHEL5.8 server rebooting on it's own - Why?


Hello,

I am running RHEL 5.8 (2.6.18-308.1.1.0.1.el5)and the server has reboot itself twice in the past 3 days. No errors are seen before the reboots and noone is logged in during the reboots. (reviewed most log files in /var/log)

Im thinking a missing patch possibly but unless I can figure out which one - I won't be granted an outage. I have searched the RH knowledge base but I haven't found anything.

Any ideas where I should check to find the RCA for the reboots?
Jennifer
 
Old 09-04-2012, 11:56 AM   #2
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,006
Blog Entries: 5

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Is it possible the server hardware or power to it blinked? Do you have any tools/logs for the hardware that might show it? (e.g. for Dell systems one can run Dell OpenManage and it keeps hardware and alert logs that might give a clue.)
 
Old 09-04-2012, 01:13 PM   #3
felbvts
Member
 
Registered: Mar 2009
Posts: 37

Original Poster
Rep: Reputation: 3
Thanks for the reply. No power outage reported. Plus it rebooted 9/2 and 9/4 at diff't times of day. Logs dont look like it was a hard reset either.

I went through the SAR logs, I dont see any cpu or memory spikes.

Right now I am thinking it's a bug - that I need a patch.
I am putting in a support call with Red Hat - Will let you know what I find out.

Any additional comments are welcome!
 
Old 09-04-2012, 03:03 PM   #4
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,108

Rep: Reputation: 311Reputation: 311Reputation: 311Reputation: 311
I'd suggest running mcelog to see if any machine check events were logged. I'd also suggest running memtest86 on the system to make sure that the RAM is good.

It could also be a heating problem. Do you have a way to monitor the CPU and motherboard temperatures (either lm-sensors or using something like IPMI)?
 
1 members found this post helpful.
Old 09-05-2012, 03:04 PM   #5
felbvts
Member
 
Registered: Mar 2009
Posts: 37

Original Poster
Rep: Reputation: 3
These were great ideas - thank you!

I've ruled out the temperature issue as none of the other servers in the rack are having any issues.

mcelog is not showing anything. /proc/cpuinfo & meminfo are not showing anything significant.
 
Old 09-05-2012, 03:27 PM   #6
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,006
Blog Entries: 5

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Temperature can affect one server in a rack worse than others. A few years back we had a rack dead center of our data center that had a DELL PERC (RAID) controller in it. Even though the system itself was not showing any temperature issues based on its internal sensors I was able to demonstrate that the PERC itself (which has no temperature sensor of its own) was being affected by heat and causing the system to lock up periodically. (Of course one other server in the rack also experienced the issue but both were the same class and both had multiple disks.) The other servers in the rack however never had any apparent issues.

I demonstrated the issue simply by opening the rack door. This gave very little extra air but was just enough to avoid the issue. Whenever I closed it after a short while I'd see the issue come back. What was maddening was that DELL denied the PERC was susceptible to heat until I'd proven it by doing these tests. Back then they had me run full diags on the system which really annoyed me because the diag for the PERC on tests whether the battery is there and charged - it did no actual component test of the board itself.
 
  


Reply

Tags
reboot, redhat, rhel5


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] server rebooting issue kingston Linux - Server 4 04-28-2011 11:16 PM
How to make one rhel5 server control passwords for several rhel5 servers. folkrm Linux - Newbie 3 03-11-2009 02:17 PM
Server rebooting sp149 Linux - Newbie 3 08-18-2008 11:29 AM
Problem with the x-server rebooting Jongi Linux - General 1 10-22-2006 12:24 PM
X-server rebooting Mr-D Linux - Newbie 13 08-17-2003 02:43 PM


All times are GMT -5. The time now is 05:11 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration