How to tell if crash/freeze is hardware or software related?
Linux - DesktopThis forum is for the discussion of all Linux Software used in a desktop context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How to tell if crash/freeze is hardware or software related?
Hello,
I have a Haswell Intel NUC that has been working well for two years, on Linux Mint Xfce 17, without any hiccups. Just the past couple of weeks, I have experienced a number of complete freezes. These are the symptoms:
-temperatures and CPU load prior to freeze is normal.
-always happens when I am doing something. Usually file browsing on Thunar or typing an email. It is on 24/7 and never crashes when left unattended. Even left unattended while doing heavy encoding with cpu at 100% for an hour has never crashed the system. Yet it might crash while I am using the computer for simple stuff.
-Crashes are hard crashes. No keyboard/mouse activity. Unresponsive to magic sysrq key. Cannot SSH into the machine from another computer. Only solution is to hold power button down and turn it off.
-Ran a complete memtest and tested hard drives for errors. Everything checks out.
-Nothing is written to any logs at the time of the crash.
Based on the symptoms, what further steps can I take to decide whether this is hardware or software related? Thanks!
A is one. Complete is harder to define. One pass I can't consider a "complete memtest", because it tests more intensively on successive passes after the first. Overnight is a much better indicator, but even longer isn't unusually necessary to discover borderline RAM.
Yea, I meant one pass. I suppose I can run it overnight. I updated the BIOS and so far, it has been several days without a crash, so perhaps it is software related.
I am wondering if anyone has insight in some of the details of the crash, particularly in that it never crashes when under heavy CPU load, but only when I am clicking around on the desktop. I would have thought heavy CPU usage would bring about hardware-related crashes more easily.
Load level has nothing directly to do with effects of problem RAM. Error consequences are about whatever is trying to use unreliable RAM location(s).
Note that most systems with Intel gfx, your NUC probably included, only have Intel gfx, which shares RAM with the OS rather than having its own dedicated to video.
So if indeed RAM is the problem here, crashes could be related to correlating mouse pointer output (video) with the effects of the pointer simply being located where it is, or a consequence of clicking.
It's possible that the motherboard is malfunctioning, a live distribution would be a way you could confirm this.
no.
a live distro is only a way to make sure that it isn't the installed distro that is causing the issue.
if it fails on both, that still doesn't confirm a hardware failure.
no.
a live distro is only a way to make sure that it isn't the installed distro that is causing the issue.
if it fails on both, that still doesn't confirm a hardware failure.
Hi...
What would you recommend in this case, if there's still a problem? You're right that it doesn't automatically confirm a hardware failure but it's one way to help diagnose it.
Regards...
Last edited by ardvark71; 08-28-2016 at 10:49 AM.
Reason: Added wordage.
but saying "it fails on distro A, AND it fails on distro B, that confirms hardware failure" is wrong.
Your correct as far as it doesn't automatically confirm hardware failure in and of itself. That's not what I meant to imply in my original post. I added a sentence to my post to you above to reflect that.
Your correct as far as it doesn't automatically confirm hardware failure in and of itself. That's not what I meant to imply in my original post. I added a sentence to my post to you above to reflect that.
ok.
but i think it's a little fishy to so significantly change a post after someone else already replied to it (and replied specifically to the part that you then changed afterwards).
but i think it's a little fishy to so significantly change a post after someone else already replied to it (and replied specifically to the part that you then changed afterwards).
I'm not sure what you mean here. I was in the process of adding "You're right that it doesn't automatically confirm a hardware failure but it's one way to help diagnose it" to post #8 when you posted #9.
But it's cool though, I think it was just a misunderstanding and I have no ill feelings. Hope your day is going well.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.