LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Ubuntu
User Name
Password
Ubuntu This forum is for the discussion of Ubuntu Linux.

Notices


Reply
  Search this Thread
Old 05-11-2017, 05:02 PM   #1
Azmisov
LQ Newbie
 
Registered: May 2017
Posts: 5

Rep: Reputation: Disabled
Help diagnosing random freezes


My computer has been freezing occasionally for about three or so years now. I've tried to diagnose what the problem is in the past, but have been unsuccessful. It may freeze 1 to 4 times a day. I haven't noticed any pattern to when it freezes. Though, I don't think it has ever freezed while I've been watching a video. It will also freeze when I'm not really using the computer (e.g. I go do something else for a while, leaving my computer up; or after locking my screen).

What I've tried and/or observed:
  • I tried both proprietary and open source drivers for my NVIDIA graphics card
  • I tried removing my graphics card and using the built-in motherboard graphics
  • Nothing gets written to kern.log or syslog at the time of freeze
  • No errors in Xorg.0.log
  • SysRq hotkeys don't do anything
  • Can't SSH into the box while it is frozen
  • Mouse/Keyboard in general stop working
  • Any audio that was playing gets stuck and then goes silent after a couple seconds
  • No errors are reported when running CPU stress tests
  • I tried disabling various cores of the CPU
  • Affects a fresh Ubuntu install as well
  • Experienced freezes in 15.04 through 16.10 (current system)
  • I tried disabling the BIOS's automatic overclocking (ASUS core unlocker)
  • I installed linux-crashdump, but no crashes are written to /var/crash

Based on this, I don't think it's a graphics problem. I don't think it's a software problem, since it affects a fresh install. It probably isn't the CPU, since I'd think it would emit a machine check exception during the stress tests (?). I was pretty sure it must be a kernel panic, but since it's not writing anything to /var/crash, I'm not sure; maybe I installed linux-crashdump incorrectly?

Do you guys have any suggestions? I'm not sure what else to try. I'm willing to replace any faulty components, but I still can't say which component (if any) is faulty.

System setup:
OS = Xubuntu 16.10
CPU = AMD Phenom II x4
GPU = NVIDIA GTX660Ti
MB = ASUS M4A88T-V EVO/USB3

Last edited by Azmisov; 05-12-2017 at 07:09 PM.
 
Old 05-12-2017, 11:51 AM   #2
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,615

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
Have you run diagnostics on the memory?
Have you instituted tracking of the CPU temperatures?
Are you running SAR and collecting ongoing data to look for patterns? What are you using for hardware, memory, cpu, and network monitoring?
I seriously doubt if there is anything actually "random" here, we have just not found the proper indicator yet.

The fact that the problem persists through a reload MAY indicate that it is more likely to be a hardware issue, but does NOT rule out software. To rule out software you would have to load a totally different OS.

Last edited by wpeckham; 05-12-2017 at 11:53 AM.
 
Old 05-12-2017, 05:11 PM   #3
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,976

Rep: Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623
Almost no board was ever tested on linux so you can never tell if it would ever work correctly. Many timings and firmware issues could cause it.

As above, memtest or two for a day or so may tell more.

The issue seems to be that no process has a chance to report. Those are the most difficult but point to full failure of some basic process. You can't rule out CPU in this as again it can't report.

Some folks might run the most basic set of hardware. Remove any extra pci board (reseat all that is reseatable in system too) and test.

I guess you could monitor power on board.

Set bios settings to failsafe or default maybe.
 
Old 05-12-2017, 05:27 PM   #4
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,615

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
I agree with the above, but encourage you to only MAKE ONE CHANGE AT A TIME when it comes to hardware.
If you shotgun this and get a fix, you will never know WHICH change made the difference.
If you really want to nail the cause, make one change at a time and test. Then the next, and test again. When you get a change, look at the very last thing that you changed.

This principal applies to both hardware and software root cause analysis.
 
Old 05-12-2017, 07:08 PM   #5
Azmisov
LQ Newbie
 
Registered: May 2017
Posts: 5

Original Poster
Rep: Reputation: Disabled
Well, I ran memtest, and it didn't give any errors for 2 passes. I suppose I'll run it overnight and see if it finds anything. I wasn't logging any hardware statistics; I just installed linux sysstat, so I guess I'll look at those logs next time it freezes. I didn't think it would be a temperature / power issue, since it will freeze when the computer isn't doing anything and should be mostly idle.
 
Old 05-12-2017, 07:13 PM   #6
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,615

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
We will hope that there will be a trend of some kind visible in the final part of the log just prior to a freeze that will give us a clue.
 
Old 05-13-2017, 01:30 AM   #7
Azmisov
LQ Newbie
 
Registered: May 2017
Posts: 5

Original Poster
Rep: Reputation: Disabled
So, it froze once today. I'm looking at the sysstat log and I'm not seeing any patterns really. It stopped logging at 9:49 pm, so I suppose it froze between 9:49 and 9:50. I've attached a file with the output of various SAR commands, for stats between 9:30 and 9:49. All the values seem normal to me. However, after I rebooted (11:30 pm, I think), the system time was off and SAR was reporting really strange numbers:
Code:
09:49:02 PM     all      0.60      0.01      0.47      0.24      0.00     98.69
05:02:46 PM     all      0.00      0.00      0.00      0.00      0.00      0.00
05:08:05 PM     all   8600.00  17900.00 491000.00      0.00      0.00  36000.00
05:08:51 PM     all      0.00   1800.00 587300.00      0.00      0.00   1000.00
05:09:10 PM     all      0.00   1900.00 590500.00      0.00      0.00    200.00
05:30:29 PM     all      0.00  13700.00 428100.00      0.00      0.00  21200.00
05:39:17 PM     all    100.00  10800.00 520500.00      0.00      0.00   8500.00
and the numbers were also really big for other outputs (-R, -b, etc). But after midnight, once the log rotated, it went back to normal. I'm guessing its because there was some garbage at the end of the file when it froze, and that made all subsequent data entries ill-formatted.
Attached Files
File Type: log sar_outputs.log (26.5 KB, 5 views)
 
Old 05-13-2017, 11:43 AM   #8
Azmisov
LQ Newbie
 
Registered: May 2017
Posts: 5

Original Poster
Rep: Reputation: Disabled
I ran memtest again; it completed 4 passes in round-robin SMP mode without any errors. I tried testing in parallel mode (all 4 cores), and it freezes on test #7 (block move). Though, I've been searching the internet, and this is supposedly a bug with memtest's SMP testing mode, and doesn't necessarily indicate any memory problems.
 
Old 05-13-2017, 02:38 PM   #9
Azmisov
LQ Newbie
 
Registered: May 2017
Posts: 5

Original Poster
Rep: Reputation: Disabled
I reset the BIOS settings, as jefro suggested. I also checked the RAM timings to make sure they were correct; I switched them from 1N to 2N. It still froze though, so I don't think that's the problem.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Random X freezes... SeRi@lDiE Slackware 2 08-21-2011 12:52 AM
Random freezes snowtigger Slackware 11 02-25-2010 07:15 AM
Random freezes DaBlade Linux - Hardware 7 10-09-2008 07:00 PM
random freezes verbose Linux - Hardware 7 02-24-2006 02:05 AM
diagnosing system freezes jeffreybluml Linux - Newbie 5 04-20-2004 10:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Ubuntu

All times are GMT -5. The time now is 10:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration