LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 04-25-2020, 04:25 PM   #1
skriptmonkey
LQ Newbie
 
Registered: Jun 2017
Location: Wasilla, Alaska
Distribution: Fedora, Linux Mint
Posts: 6

Rep: Reputation: Disabled
LinuxMint 19.3 Sudden Power-Off Under High GPU Usage


Hello folks,
I'm looking for some suggestions on where to look for troubleshooting.

Problem:
My desktop will power off (as if the power cable were pulled) when under high graphical load (gaming) with no warning or errors that I can find. The power-off is sudden, one moment I'm fine, then in a blink of an eye the screens are black and the PC is shut down. To turn the PC back on I need to toggle the switch on the PSU off, then on again. Then I can hit he power button.

No issues under normal usage, this only happens while gaming or stress-testing the GPU.

What have I tried to fix it?
  • RMA'd the graphics card and they sent back a new replacement.
  • RMA'd the Motherboard and they sent back a new replacement.
  • Upgraded my PSU from a 500w to a new 600w PSU.
  • Checked all the cables to ensure they're plugged in properly and snug.
  • Compiled the Unreal Engine (which takes just under an hour on my machine) with no problems. I'm assuming this was a good CPU stress test.
  • Run additional stress tests via Phoronix Test Suite. No problems with CPU tests but the problem persists with Graphics tests.
  • I've checked
    Code:
    journalctl -b -1
    but I'm not seeing anything that would indicate a problem
  • I've also run
    Code:
    journalctl -f > ~/journalctl.log
    while replicating the problem, but again no signs of a problem.
  • Ran memtest86 to test the RAM. No problems.

My PC Specs:
Code:
PCPartPicker Part List

CPU: AMD Ryzen 7 2700X 3.7 GHz 8-Core Processor  ($293.99 @ Amazon) 
Motherboard: Asus PRIME B450M-A/CSM Micro ATX AM4 Motherboard  ($82.99 @ B&H) 
Memory: G.Skill Ripjaws V Series 16 GB (2 x 8 GB) DDR4-2666 Memory  ($75.99 @ Newegg) 
Storage: Samsung 970 Evo 250 GB M.2-2280 NVME Solid State Drive  ($109.00 @ B&H) 
Video Card: XFX Radeon RX 580 8 GB GTS Black Video Card  ($179.99 @ Best Buy) 
Case: Thermaltake Core V21 MicroATX Mini Tower Case  (Purchased For $0.00) 
Power Supply: EVGA 600 W 80+ Certified ATX Power Supply  ($44.99 @ Best Buy) 
Monitor: Asus VH238H 23.0" 1920x1080 Monitor  ($249.00 @ Amazon) 
Monitor: Asus VH238H 23.0" 1920x1080 Monitor  ($249.00 @ Amazon) 
Total: $1284.95
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2020-04-25 17:11 EDT-0400
Any suggestions on where to start looking for clues to where the problem is would be very helpful. I don't think it's an overheat issue, I've watched the sensors output while testing and it never goes into a critical temp. I'm pretty much at a loss of where else to look for a problem.

Last edited by skriptmonkey; 04-25-2020 at 11:24 PM.
 
Old 04-26-2020, 08:47 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,613

Rep: Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618
Gaming is not only high GPU stress, it's high CPU stress. The APU hitting critical temperature is an automatic trigger for power off. Things usually shut down elegantly, but depending on the BIOS it could be kneejerk.
 
Old 04-26-2020, 08:58 AM   #3
uteck
Senior Member
 
Registered: Oct 2003
Location: Elgin,IL,USA
Distribution: KDE Neon
Posts: 1,272

Rep: Reputation: 525Reputation: 525Reputation: 525Reputation: 525Reputation: 525Reputation: 525
What video driver are you using, the default Radeon open source drier, or the propitiatory driver from AMD?
I assume that all the cooling fans are working and that the case has good air flow. Is the fancontrol program seeing the fans and temp settings?
 
Old 04-26-2020, 05:26 PM   #4
skriptmonkey
LQ Newbie
 
Registered: Jun 2017
Location: Wasilla, Alaska
Distribution: Fedora, Linux Mint
Posts: 6

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by uteck View Post
What video driver are you using, the default Radeon open source drier, or the propitiatory driver from AMD?
I assume that all the cooling fans are working and that the case has good air flow. Is the fancontrol program seeing the fans and temp settings?
  • I am using the default open source amdgpu driver.
  • Cooling fans are all working and there is plenty of airflow into the case.
  • For fan control, the only thing I know to look at is the output from sensors. I do have the fans set to "performance" mode in the BIOS.
    Code:
    This is under idle usage. My understanding is that "k10temp-pci-00c3" is the CPU fan.
    I don't see anything for the fan at the front of the case.
    
    $ sensors
    asus-isa-0000
    Adapter: ISA adapter
    cpu_fan:        0 RPM
    
    amdgpu-pci-0900
    Adapter: PCI adapter
    vddgfx:       +0.95 V  
    fan1:         772 RPM  (min =    0 RPM, max = 3500 RPM)
    edge:         +40.0°C  (crit = +94.0°C, hyst = -273.1°C)
    power1:       29.03 W  (cap = 145.00 W)
    
    k10temp-pci-00c3
    Adapter: PCI adapter
    Tdie:         +31.5°C  (high = +70.0°C)
    Tctl:         +41.5°C
 
Old 04-26-2020, 05:55 PM   #5
uteck
Senior Member
 
Registered: Oct 2003
Location: Elgin,IL,USA
Distribution: KDE Neon
Posts: 1,272

Rep: Reputation: 525Reputation: 525Reputation: 525Reputation: 525Reputation: 525Reputation: 525
Since it only happens when under high GPU load, it seems to be hardware or driver related.
You could try tailing /var/log/faillog, /var/log/kern.log, and /var/log/syslog, with the -f option to follow the log as it updates like so
Code:
tail -f /var/log/syslog
that way yo may be able to see the error if it does not get written to disk.

You could try the other AMD driver and see if that makes a difference in your stress test. Might be a bug in the open driver?
 
Old 04-27-2020, 03:18 AM   #6
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,613

Rep: Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618Reputation: 2618
@scriptmonkey: Those temperatures you showed us were at tickover, not gaming.

Let's see some temperatures from high cpu/gpu loads - 5 minutes and 10 minutes into a game. You might seriously need to upgrade the fans and cpu heatsinks. If they're not black, painting them matt black improves conductivity. That's been laboratory tested. In fact there used to be a paint sold in motorbike shops called 'cylinder head black' for the 2 strokes which were always on the edge of heat problems, the way they were driven here anyway.
 
Old 04-30-2020, 07:50 PM   #7
skriptmonkey
LQ Newbie
 
Registered: Jun 2017
Location: Wasilla, Alaska
Distribution: Fedora, Linux Mint
Posts: 6

Original Poster
Rep: Reputation: Disabled
So I've adjusted the fans. Instead of just selecting "Performance" mode in the BIOS I went ahead and manually set the fans all to turbo. I haven't had a chance to really test if this change made a large difference or not. I played a game for about 20 minutes today mainly just to capture the sensors output after running the game a while.

Code:
$ sensors
asus-isa-0000
Adapter: ISA adapter
cpu_fan:        0 RPM

amdgpu-pci-0900
Adapter: PCI adapter
vddgfx:       +1.20 V  
fan1:         771 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +70.0?C  (crit = +94.0?C, hyst = -273.1?C)
power1:       45.16 W  (cap = 145.00 W)

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +53.9?C  (high = +70.0?C)
Tctl:         +63.9?C
I'll do more testing when I get a chance to sit down for a while.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Screen power off doesn't power off GPU Geremia Slackware 8 04-27-2021 07:39 PM
[SOLVED] LinuxMint 18 Host with LinuxMint 18 Guest in VirtualBox 5.1 Error TedCleggett Linux - Virtualization and Cloud 8 09-16-2016 02:40 PM
Desktop MEM/CPU/GPU/Power Usage test X20055 Linux - Desktop 4 08-23-2015 05:54 PM
[SOLVED] Sudden power-off during boot maurofolc Linux - General 3 12-25-2013 02:15 AM
sudden power off when compiling a program rllovera Linux - Hardware 5 01-12-2005 04:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 03:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration