LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 06-07-2020, 12:48 PM   #1
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Rep: Reputation: Disabled
Unhappy Ubuntu 18.04 and RTX 2080 SUPER systematically freezing


Dear all,
I'm trying to solve an insistent problem: my Ubuntu 18.04.4 LTS is randomly freezing when (apparently) using my graphics card (RTX 2080 SUPER). Kernel logs don't show anything useful (syslog, kern, and xorg attached).

When the freezing occurs, I can't use either mouse or keyboard. Pressing any key on my keyboard also makes the numlock key goes off, which makes it impossible to safely reboot my system with the ALT + SysRQ method. Disconnecting/reconnecting the USB cables does not solve the problem.

I've noted this freezing problem occurs under two circumstances: (i) while training a deep learning model using TensorFlow 1.14 and CUDA, and (ii) while playing DoTA2 (although this DoTA2 freezing is fairly new, it does not occur every time).

I have already tried the following "possible solutions", but to no avail:
1. Setting my fans to full speed/performance mode, thinking it could be an overheating problem (although it was unlikely, given that my PC is new);
2. Placing nouveau drivers in /etc/modprobe.d/blacklist.conf;
3. Changing a BIOS setting for "suspend to ram disabled";
4. Switching from gdm to lightdm (as recommended in this post);
5. Switching from nvidia-driver-440 to nvidia-driver-435 (both proprietary drivers);
6. Formatting my PC and reinstalling Ubuntu 18.04.4 LTS;
7. Formatting my PC and installing Ubuntu 19.10;
8. Updating my BIOS from version 4.00 to 4.30.

I don't know if it's useful, but I also have Windows 10 on dual boot with Ubuntu (19.10 now). I've formatted my computer yesterday (again) thinking it would solve my problem, so I'm up to anything.

Any help would be appreciated.


---
Hardware & Other Settings:
SO: Ubuntu 18.04.4 LTS (Dual Boot: Windows 10)
Kernel: Linux 5.3.0-53-generic
Processor: Intel Core i9-9900KF 3.60GHz (5.0GHz Turbo)
Graphics: Asus Rog Strix GeForce RTX 2080 SUPER/PCIe/SSE2 8GB GDDR6 256Bit
GL Version: 4.5.0 NVIDIA 440.59
Motherboard: ASRock Z390 Extreme 4 Chipset Z390 Intel LGA 1151 ATX DDR4
Memory: DDR4 Corsair Vengeance RGB Pro (4x8GB) 3600MHz
Water Cooler: Corsair H115i Pro RGB 280mm
Power Supply: XFX 650W XTR Series ATX/EPS Full Modular 80PLUS GOLD, P1-650B-BEFX
Storage: SSD Corsair Force MP510 960GB M.2 2280 NVMe and HD Seagate Barracuda 1TB (only used as extra storage space, mounted on /mnt/data)
---
 
Old 06-07-2020, 03:16 PM   #2
beachboy2
Senior Member
 
Registered: Jan 2007
Location: Wild West Wales, UK
Distribution: Linux Mint 20.2 MATE, MX-19.3, antiX, EndeavourOS
Posts: 3,349
Blog Entries: 17

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
annacmr,

Welcome to LQ forums.

Before delving further, I would try installing Ubuntu 20.04 and see whether that improves things.

What sort of CPU temperature are you getting when giving the system a good thrashing?

Last edited by beachboy2; 06-07-2020 at 03:19 PM.
 
Old 06-07-2020, 04:38 PM   #3
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hello @beachboy,

Thank you for the considerations.

After today's formatting attempt (from 18.04.4 LTS to 19.10) and BIOS update (4.00 to 4.30), some settings I've been using were restored to default. I've performed steps 1-5 (list above) again just to guarantee. After the BIOS update, I've played a DoTA2 match and after about 30-40 minutes my PC froze again. In complete despair, I turned my PC off, opened its cover, disconnected, and connected my GPU. Since then I'm running one instance of my deep learning model and the freeze did not happen (but I think it's a happy coincidence). I'll let it be for today to see if something has changed at all, but if not I'll install Ubuntu 20.04. Since this version have a new kernel I think the chances of improving my situation are higher. When formatting I'm keeping my /home folder. Do you think it can be a problem in this case since many configuration files are being kept?

About the temperatures, I'm using psensor to measure. Right now (running one instance of my code) I'm experiencing something around 50-65°C. When running two/three instances the temperature floats around 80-95°C. When only playing, the temperature is almost the same as running one instance on my code. What would be the best way to keep track of it?

Kind Regards,
Anna
 
Old 06-07-2020, 05:48 PM   #4
beachboy2
Senior Member
 
Registered: Jan 2007
Location: Wild West Wales, UK
Distribution: Linux Mint 20.2 MATE, MX-19.3, antiX, EndeavourOS
Posts: 3,349
Blog Entries: 17

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
Anna,

There are 3 more temperature monitoring tools for Linux here:
https://www.tecmint.com/monitor-cpu-...ure-in-ubuntu/

Let us know if 20.04 makes any difference.
 
Old 06-07-2020, 07:00 PM   #5
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hello @beachboy,

Unfortunately, it was indeed a happy coincidence with 19.10. My PC froze again after playing for 40 minutes. I'm with Ubuntu 20.04 LTS now (just formatted) and I'll let you know if the new kernel is enough to solve my problem. I kept my /home folder during the SO installation. Do you think it can be a problem?

Thank you one more time,
Anna
 
Old 06-08-2020, 03:47 AM   #6
beachboy2
Senior Member
 
Registered: Jan 2007
Location: Wild West Wales, UK
Distribution: Linux Mint 20.2 MATE, MX-19.3, antiX, EndeavourOS
Posts: 3,349
Blog Entries: 17

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
Anna,

I really doubt whether the retention of your /home folder has anything at all to do with the freezing problem.

Fingers crossed!
 
Old 06-09-2020, 10:27 PM   #7
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hey there, @beachboy!

I’m here just to post an update. Unfortunately, Ubuntu 20.04 LTS alone wasn't able to solve my problem. I was, although, trying what could be other "possible solutions".

I tried, for example, to limit my GPU clock (sudo nvidia-smi -lgc 300,1650), but it wasn’t able to stop the freezings. Setting the upper limit even lower (1100) also didn’t help, what made me exclude problems with my PSU. I came across this post yesterday and it caught my attention: someone was reporting a problem with NVIDIA’s adaptive mode. I tried the solution and today I didn’t experience any freeze. I also disabled GNOME’s notifications, but I don’t know if it has anything to do with this problem.

In case anyone looks for this post in the future, what seemed to have solved the problem for me was to put the following command (nvidia-settings -a [gpu:0]/GpuPowerMizerMode=1") on my startup applications (it changes the GPU performance level to full performance):

Click image for larger version

Name:	Screenshot from 2020-06-09 22-55-17.png
Views:	14
Size:	28.7 KB
ID:	33363


I also forgot to tell you how this problem started. I bought my new PC on January and when it arrived I installed Ubuntu 18.04 and Windows 10 in a dual boot setting. Everything was ok until one day I entered Windows. Just after that, I was experiencing the same errors I have today. I don’t recall how I was able to solve the problem, but it somehow stopped happening. Last week or so I logged into Windows after a long time, and the problem came back again. I don’t know if it updated something, but Windows certainly messed some of my BIOS configurations (I had to change my DRAM frequency settings again, for example). Is it a coincidence or Windows can cause a problem like that?

I will come back here in a week or so to report if everything is still functioning. If so I will give my problem as solved.

Thank you one more time!
 
Old 06-10-2020, 03:37 AM   #8
beachboy2
Senior Member
 
Registered: Jan 2007
Location: Wild West Wales, UK
Distribution: Linux Mint 20.2 MATE, MX-19.3, antiX, EndeavourOS
Posts: 3,349
Blog Entries: 17

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
Anna,

I hope this change to nvidia settings solves your problem.

I have heard of motherboard software (e.g. MSI Dragon Centre) changing RAM speed and other settings in Windows.
 
Old 06-10-2020, 07:37 AM   #9
gotzl
LQ Newbie
 
Registered: Jun 2020
Posts: 6

Rep: Reputation: Disabled
Hi,
maybe you suffer from this https://forums.developer.nvidia.com/...k-up/79731/228
 
Old 06-10-2020, 10:24 AM   #10
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hello @gotzl,

It's unlikely because from my log entries I don't see any XID 61 or XID 62 errors. The problem mentioned there also seems related to Ryzen 3rd/7th generation systems, and the hang can occur while in an idle state (what's not my case). One of the replies to the original post, however, is very insightful: they recommend connecting to SSH over LAN from another computer when the freeze occurs. I'll try that if the problem happens again.

Kind Regards,
Anna

Last edited by annacmr; 06-10-2020 at 11:08 AM.
 
Old 06-10-2020, 10:34 AM   #11
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hey there, @beachboy,

I remember installing software there to kinda play with my DRAM colors, but I don't recall messing with more drastic stuff. I also don't know if it can be related to 'Fast Startup'. Anyway, if the problem is solved I won't be logging in Windows any time soon, hahaha.

Kind Regards,
Anna
 
Old 06-10-2020, 11:04 AM   #12
beachboy2
Senior Member
 
Registered: Jan 2007
Location: Wild West Wales, UK
Distribution: Linux Mint 20.2 MATE, MX-19.3, antiX, EndeavourOS
Posts: 3,349
Blog Entries: 17

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
Anna,

Fingers crossed!
 
Old 06-10-2020, 07:22 PM   #13
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Original Poster
Rep: Reputation: Disabled
The error persists and it is also happening on Windows 10 (it behaves the same way described on the first post). In Ubuntu I can't even SSH from another machine to see what's happening or try to restart my PC. I'm running out of options and contacting the motherboard vendor right now to see if they can give some light to this situation.

Edit: this post reveals interesting information regarding PSU problems. It seems almost exacly like I described. How can I know if my PSU is damaged?

Last edited by annacmr; 06-10-2020 at 08:15 PM.
 
Old 06-11-2020, 02:29 AM   #14
beachboy2
Senior Member
 
Registered: Jan 2007
Location: Wild West Wales, UK
Distribution: Linux Mint 20.2 MATE, MX-19.3, antiX, EndeavourOS
Posts: 3,349
Blog Entries: 17

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
Anna,

Since all XFX PSUs are made by Seasonic, it is unlikely that your PSU is faulty, especially after such a short time.
 
Old 06-15-2020, 09:10 AM   #15
annacmr
LQ Newbie
 
Registered: Jun 2020
Location: São Paulo/SP, Brazil
Distribution: Ubuntu 20.04 LTS
Posts: 8

Original Poster
Rep: Reputation: Disabled
Some updates: I’ve been running my setup without problems or interruptions for the past three days. The only thing I’ve changed was my DRAM operation frequency: from 3600 MHz (maximum) to 3000 MHz. After I noticed my settings were stable I also used "sudo nvidia-smi -lgc 300,2115" to return to my old GPU frequency configurations and it’s still ok. Do you have any insights about why this is happening? Is there any way I can check for DRAM problems?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Rugged, Linux-ready RTX COMs tap Sitara and i.MX6 LXer Syndicated Linux News 0 03-30-2016 09:21 PM
How Centos 6 decides what applications belong to what workspace systematically? madao Linux - Newbie 1 11-13-2014 08:39 PM
Audio CD lags systematically with MPlayer? violagirl23 Linux - Software 3 09-11-2007 05:38 PM
fc5 2080 killed rpm valis Fedora 1 03-30-2006 02:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 08:43 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration