LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Debian
User Name
Password
Debian This forum is for the discussion of Debian Linux.

Notices

Reply
 
Search this Thread
Old 04-28-2012, 12:27 AM   #1
ctrlbrk
LQ Newbie
 
Registered: Jun 2011
Posts: 11

Rep: Reputation: Disabled
Troubleshooting random reboots, Debian 6 kernel 3.2, Xeon E5-1650


Hi guys,

I've built a new server (specs below) to ship off to my datacenter for co-lo, to replace a rock-solid but older piece of hardware.

The new server is randomly rebooting. I could use some advice on narrowing it down.

I've built hundreds of servers in my former line of work, this is not my first dance -- but these days I work for myself (trading) and don't have piles of hardware laying around like I used to at the office job. So I can't just throw down a new set of hardware and see if problem goes away.

Specs:
SuperMicro SuperServer 5027R-WRF
SuperMicro X9SRW-F
Intel Xeon E5-1650
64GB Kingston DDR3 1600 ECC REG CL11 1.5V (2 x KVR1600D3D4R11SK4/32G)
4xWD 500GB RE4 WD5003ABYX

Debian 6
Kernel 3.2.0 (installed from backports)

I have hammered on this system hard with everything passing with flying colors. I was going to ship it off to the datacenter within a few hours, and then out of the blue it rebooted. With no load, nothing going on, zero --- I heard it reboot from the other room.

This is the second time the system has randomly rebooted. The first time was the very first time I ran sysbench on it as a burn-in test. I chalked it up to a sysbench problem because it rebooted within seconds of me pressing enter on the test. I proceeded to hammer on sysbench for days with huge load and it was completely stable.

Then the other day it was just idle with nothing going on, and bam - reboot.

I checked all the logs, nothing. There is no kernel panic. There is absolutely zero to go on. This would seem to point to hardware, but yet I have pounded on this system for the last couple weeks and have never had a single problem.

I can't ship this to the datacenter with a random reboot problem.

It has passed over 5 passes on memtest. I will let it go to ten, that alone will take another couple days.

Things I have thought of:
1) Reduce to 1 stick of memory and see if problem goes away. Well problem with this is that with the problem manifesting only randomly, and only every couple weeks, trying to narrow it down to memory (8 sticks) could take months. Plus they pass memtest.

2) Backup current system (clonezilla) and re-install a different kernel or distro, and see if problem manifests itself. Again, this is really not ideal because I want to run Debian 6. I could revert to 2.6.32 kernel but there were some nice speed improvements with kernel 3.2.0. And how long do I test on 2.6.32 kernel before I feel assured the problem was with 3.2.0? Weeks? Months? All the while, this server is sitting in my house instead of at the co-lo, so costing me money.

Other suggestions? I checked /proc/sys/kernel/panic and it was 0, which means it should not automatically reboot during a panic (plus there is no log anywhere indicating a panic).
 
Old 04-30-2012, 10:30 PM   #2
ctrlbrk
LQ Newbie
 
Registered: Jun 2011
Posts: 11

Original Poster
Rep: Reputation: Disabled
Bump ?
 
Old 05-01-2012, 08:23 AM   #3
odiseo77
Member
 
Registered: Dec 2004
Location: London, UK
Distribution: Debian Sid, OpenSUSE 13.1
Posts: 992

Rep: Reputation: 294Reputation: 294Reputation: 294
Sorry I can't be of help here (don't know much about hardware), but since this most likely is due to a hardware issue (not directly related to Debian), maybe you can report your own thread and ask a moderator to move it to the Linux - Hardware forum section? Probably it will get better attention there

Regards.
 
Old 05-01-2012, 12:40 PM   #4
edbarx
Member
 
Registered: Sep 2010
Distribution: Used Debian since Sarge. (~2005)
Posts: 340

Rep: Reputation: 18
Make a deep RAM test. Random reboots are often the result of a failing memory module.
 
Old 05-01-2012, 11:29 PM   #5
johnbendie
LQ Newbie
 
Registered: Aug 2006
Posts: 1

Rep: Reputation: 0
Suspecting power related bug

ctrlbrk: I'm having a similar problem as well with a similar kernel of 3.2 . I have checked all log files with no related useful information. But one thing I've seemed to notice is that it seems related to power issues cos when I unplug my laptop I don't get reboots though I have just started trying that out. So you can check to see if the same applies in your case. In any case please do keep me informed of any progress you make on this issue as it's annoying not knowing where to start troubleshooting from. I may have to resort to building a kernel from the stable pristine source from linux archives or downgrade to a lower version from the debian sources.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Random Reboots of Debian 6.0.3 Loboexe Linux - Hardware 3 01-10-2012 08:34 PM
capturing logs of random reboots on Debian Squeeze + Xen bweaver Linux - Virtualization and Cloud 2 08-16-2011 12:30 PM
[SOLVED] Need help in troubleshooting/resolving random kernel panic on multiple servers EricTRA Linux - Server 7 08-20-2010 03:25 AM
Random reboots Quads Linux - Newbie 4 07-05-2009 03:10 AM
random reboots rclawson Mandriva 3 10-26-2003 08:09 AM


All times are GMT -5. The time now is 03:09 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration