LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 08-12-2016, 05:14 PM   #1
zombieno7
Member
 
Registered: Aug 2012
Posts: 49

Rep: Reputation: Disabled
Random Restarts With No Error Messages


Over the past couple of weeks, my Gentoo system has been restarting randomly. There are no error messages displayed, and the restarts can happen days apart. I haven't noticed any cause or similar circumstances surrounding the restarts. I checked reboot, shutdown, and error logs, and the reboots are not logged. After the most recent reboot, I checked dmesg, and I saw that the last thing logged was an error saying that a kernel kworker stopped after reaching the stack limit. I'm not entirely sure if that's related. So, I have a couple of ideas, and I just want some help narrowing it down.

1. The Kernel - I'm running a custom build 4.7 kernel, and with those kworker messages.

2. The RAM - it's a common cause, but I rad memtest86+ for several hours with no errors

3. Somewhat old SSD - My /home folder is on an SSD that's a over a year old and has some bad sectors.

4. New HDD - I just cloned a dying 2TB HDD onto a new one, and I think there might have been some corruption because of the problems with the old drive.

That's all I can really think of, so if anyone has any ideas, it would be greatly appreciated. Thank you.
 
Old 08-12-2016, 05:20 PM   #2
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~arch
Posts: 7,231

Rep: Reputation: Disabled
RAM, memtest can run for days and the RAM can still be bad. The only conclusive result from memtest is when it tells the RAM is bad - bad it is then.
Overheating (dust), out of specs PSU are other common reasons.
 
Old 08-12-2016, 06:00 PM   #3
zombieno7
Member
 
Registered: Aug 2012
Posts: 49

Original Poster
Rep: Reputation: Disabled
It's definitely not overheating. It's water cooled. The PSU is new but refurbished. The wattage is good, though. I just don't understand why the RAM would randomly become a problem, especially since it's less than a year old. Could it be a bad kernel? I would try to test it, but I can't figure out what's triggering the resets.
 
Old 08-13-2016, 06:48 AM   #4
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922
Have you fsck'd all your filesystems (including checking your disks for badblocks), and checked the disks' SMART status?
 
Old 08-13-2016, 07:46 AM   #5
Shadow_7
Senior Member
 
Registered: Feb 2003
Distribution: debian
Posts: 4,137
Blog Entries: 1

Rep: Reputation: 873Reputation: 873Reputation: 873Reputation: 873Reputation: 873Reputation: 873Reputation: 873
Ram would have been my first guess, but you covered that.

Power would be my 2nd guess. If it's not getting enough power it could cycle.

Heat, but that's a not much of a mystery if you're in the same room as the device. But there could be fans that are not working or clogged heat syncs that cancel out any would be air movement.

Beyond that software. Some sort of resource over usage. Running out of ram and no swap to ease the burden. If you have swap, you might try moving it to another device. It tends to wear out drives so if you're already having issues, putting something fresh, even if it's an SDHC card and a reader could be the answer. You might also have swap, but have swappiness set to 0 so it behaves like it doesn't have swap.

But generally if there's no messages / logs and such it's a hardware issue. Not to say that it isn't triggered by software. Like running a 64 bit OS on a 32 bit machine. Or a kernel compiled with extended CPU features and run on a CPU without said features. But "random" almost always means hardware. Which may not be "your" hardware, if the power blinks or the A/C stops working and such.
 
Old 08-13-2016, 08:04 AM   #6
273
LQ Addict
 
Registered: Dec 2011
Location: UK
Distribution: Debian Sid AMD64, Raspbian Wheezy, various VMs
Posts: 7,585

Rep: Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351
Perhaps run another kernel or, even better, a live distribution for a while and see whether it happens? I've had USB fans cause a reboot before and an of a mind to think as above that PSU or even mains problems may be the cause (I've seen lots of 2 second power cuts too).
 
Old 08-13-2016, 12:11 PM   #7
zombieno7
Member
 
Registered: Aug 2012
Posts: 49

Original Poster
Rep: Reputation: Disabled
Okay, so for an update; I ran SMART short tests on both drives, and they passed. I also ran memtest86+ for 6+ hours through 4 cycles with no errors.

It's not a heat issue with the CPU because it is water cooled, and I have a monitor on the desktop through lm_sensors. Is stays in an acceptable range at all times. Could it be the motherboard overheating independently? It just doesn't seem like overheating because it doesn't necessarily happen during peak loads. There is no dust problem either. I keep the machine clean.

The PSU is a Corsair AX 860i, so I seriously doubt that it isn't getting enough power.

Should I run longer SMART tests on the drives? Are there other tests that I can run? The restarts happen so infrequently that it's very hard to test. The computer can run fine for days without it happening.
 
Old 08-13-2016, 12:25 PM   #8
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~arch
Posts: 7,231

Rep: Reputation: Disabled
Yes, motherboard components can overheat, northbridge for instance. Memtest ... you say your computer may run for two days before it reboots ... makes you think memtest needs to run for two days, too? Anyhow, if it reboots then you won't get any errors from memtest, obviously. PSU can be out of specs, the voltages may fluctuate out of allowed range or be out of range permanently. Use a real voltmeter to measure. Re-seating all components (memory modules, PCI cards) won't hurt, either.
 
Old 08-13-2016, 12:44 PM   #9
273
LQ Addict
 
Registered: Dec 2011
Location: UK
Distribution: Debian Sid AMD64, Raspbian Wheezy, various VMs
Posts: 7,585

Rep: Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351
The PSU or motherboard may have one dry joint. Just start troubleshooting...
 
Old 08-13-2016, 12:51 PM   #10
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922Reputation: 2922
Do you have your system set up to auto reboot after a kernel panic?
 
Old 08-13-2016, 01:02 PM   #11
zombieno7
Member
 
Registered: Aug 2012
Posts: 49

Original Poster
Rep: Reputation: Disabled
I would love to be able to leave memtest running for two days, but I just don't have that kind of time. This is my only work computer. I might be able to let it go for a long time tonight, but not two days. The PSU is new. It's a Corsair AX860i. I seriously doubt that it's the problem, especially since this didn't start until weeks after it was installed. I could see getting a bad one out of the box, but not having it go bad in a couple of weeks. There are no logs of kernel panics, and the kernel is not set to reboot on a panic.
 
Old 08-13-2016, 01:13 PM   #12
273
LQ Addict
 
Registered: Dec 2011
Location: UK
Distribution: Debian Sid AMD64, Raspbian Wheezy, various VMs
Posts: 7,585

Rep: Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351
Then a live distro.
 
Old 08-13-2016, 01:17 PM   #13
zombieno7
Member
 
Registered: Aug 2012
Posts: 49

Original Poster
Rep: Reputation: Disabled
Live distro? Why? Is it at all possible that this is a hard drive problem? I'm running a long SMART test now, and when that finishes, I'll run fsck. They're the oldest parts of the system and dmesg did report bad partitions(not sure how accurate that is).
 
Old 08-13-2016, 01:20 PM   #14
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~arch
Posts: 7,231

Rep: Reputation: Disabled
HD failure is not likely to reboot the box, you would get some sort of hang/crash. Refurbished PS can go bad any time, I wouldn't rule it out, and I'd be all over it with voltmeter.
 
Old 08-13-2016, 01:25 PM   #15
273
LQ Addict
 
Registered: Dec 2011
Location: UK
Distribution: Debian Sid AMD64, Raspbian Wheezy, various VMs
Posts: 7,585

Rep: Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351Reputation: 2351
Quote:
Originally Posted by zombieno7 View Post
Live distro? Why? Is it at all possible that this is a hard drive problem? I'm running a long SMART test now, and when that finishes, I'll run fsck. They're the oldest parts of the system and dmesg did report bad partitions(not sure how accurate that is).
if a live distro doesn't crash it may rule that in.
Please think.
Edit: and it rules out the kernel etc.

Last edited by 273; 08-13-2016 at 01:26 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Random restarts Slackware64 13.37 rkfb Slackware 5 03-24-2012 08:51 PM
random restarts? nick623 Ubuntu 10 05-08-2007 04:07 PM
Random Crashes w/ no error messages in DeMuDi MichaelS Debian 0 06-17-2006 02:09 PM
Still X Random restarts redhatnoob Linux - General 2 02-17-2004 05:25 PM
Random X restarts on new install tomser Linux - Hardware 6 09-07-2003 10:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration