UbuntuThis forum is for the discussion of Ubuntu Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
My computer has been freezing occasionally for about three or so years now. I've tried to diagnose what the problem is in the past, but have been unsuccessful. It may freeze 1 to 4 times a day. I haven't noticed any pattern to when it freezes. Though, I don't think it has ever freezed while I've been watching a video. It will also freeze when I'm not really using the computer (e.g. I go do something else for a while, leaving my computer up; or after locking my screen).
What I've tried and/or observed:
I tried both proprietary and open source drivers for my NVIDIA graphics card
I tried removing my graphics card and using the built-in motherboard graphics
Nothing gets written to kern.log or syslog at the time of freeze
No errors in Xorg.0.log
SysRq hotkeys don't do anything
Can't SSH into the box while it is frozen
Mouse/Keyboard in general stop working
Any audio that was playing gets stuck and then goes silent after a couple seconds
No errors are reported when running CPU stress tests
I tried disabling various cores of the CPU
Affects a fresh Ubuntu install as well
Experienced freezes in 15.04 through 16.10 (current system)
I tried disabling the BIOS's automatic overclocking (ASUS core unlocker)
I installed linux-crashdump, but no crashes are written to /var/crash
Based on this, I don't think it's a graphics problem. I don't think it's a software problem, since it affects a fresh install. It probably isn't the CPU, since I'd think it would emit a machine check exception during the stress tests (?). I was pretty sure it must be a kernel panic, but since it's not writing anything to /var/crash, I'm not sure; maybe I installed linux-crashdump incorrectly?
Do you guys have any suggestions? I'm not sure what else to try. I'm willing to replace any faulty components, but I still can't say which component (if any) is faulty.
System setup:
OS = Xubuntu 16.10
CPU = AMD Phenom II x4
GPU = NVIDIA GTX660Ti
MB = ASUS M4A88T-V EVO/USB3
Have you run diagnostics on the memory?
Have you instituted tracking of the CPU temperatures?
Are you running SAR and collecting ongoing data to look for patterns? What are you using for hardware, memory, cpu, and network monitoring?
I seriously doubt if there is anything actually "random" here, we have just not found the proper indicator yet.
The fact that the problem persists through a reload MAY indicate that it is more likely to be a hardware issue, but does NOT rule out software. To rule out software you would have to load a totally different OS.
Almost no board was ever tested on linux so you can never tell if it would ever work correctly. Many timings and firmware issues could cause it.
As above, memtest or two for a day or so may tell more.
The issue seems to be that no process has a chance to report. Those are the most difficult but point to full failure of some basic process. You can't rule out CPU in this as again it can't report.
Some folks might run the most basic set of hardware. Remove any extra pci board (reseat all that is reseatable in system too) and test.
I agree with the above, but encourage you to only MAKE ONE CHANGE AT A TIME when it comes to hardware.
If you shotgun this and get a fix, you will never know WHICH change made the difference.
If you really want to nail the cause, make one change at a time and test. Then the next, and test again. When you get a change, look at the very last thing that you changed.
This principal applies to both hardware and software root cause analysis.
Well, I ran memtest, and it didn't give any errors for 2 passes. I suppose I'll run it overnight and see if it finds anything. I wasn't logging any hardware statistics; I just installed linux sysstat, so I guess I'll look at those logs next time it freezes. I didn't think it would be a temperature / power issue, since it will freeze when the computer isn't doing anything and should be mostly idle.
So, it froze once today. I'm looking at the sysstat log and I'm not seeing any patterns really. It stopped logging at 9:49 pm, so I suppose it froze between 9:49 and 9:50. I've attached a file with the output of various SAR commands, for stats between 9:30 and 9:49. All the values seem normal to me. However, after I rebooted (11:30 pm, I think), the system time was off and SAR was reporting really strange numbers:
Code:
09:49:02 PM all 0.60 0.01 0.47 0.24 0.00 98.69
05:02:46 PM all 0.00 0.00 0.00 0.00 0.00 0.00
05:08:05 PM all 8600.00 17900.00 491000.00 0.00 0.00 36000.00
05:08:51 PM all 0.00 1800.00 587300.00 0.00 0.00 1000.00
05:09:10 PM all 0.00 1900.00 590500.00 0.00 0.00 200.00
05:30:29 PM all 0.00 13700.00 428100.00 0.00 0.00 21200.00
05:39:17 PM all 100.00 10800.00 520500.00 0.00 0.00 8500.00
and the numbers were also really big for other outputs (-R, -b, etc). But after midnight, once the log rotated, it went back to normal. I'm guessing its because there was some garbage at the end of the file when it froze, and that made all subsequent data entries ill-formatted.
I ran memtest again; it completed 4 passes in round-robin SMP mode without any errors. I tried testing in parallel mode (all 4 cores), and it freezes on test #7 (block move). Though, I've been searching the internet, and this is supposedly a bug with memtest's SMP testing mode, and doesn't necessarily indicate any memory problems.
I reset the BIOS settings, as jefro suggested. I also checked the RAM timings to make sure they were correct; I switched them from 1N to 2N. It still froze though, so I don't think that's the problem.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.