LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 04-14-2020, 10:59 AM   #61
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 8,792

Rep: Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656

Quote:
Originally Posted by asheshambasta View Post
I had several crashes this weekend.
After the first crash, I disabled the C-states in the BIOS. The system had trouble being put to suspend, but it would ultimately work. It seemed stable. I then added the rcu kernel flags, which only have effect if the kernel is built with related options. That also didn't help.

Even with the C-states set to off in my BIOS, I had another crash yesterday.

I went tinkering with the BIOS again. I set the idle control to "Typical current idle", since it had been mysteriously set to "auto" after my last BIOS update. It seems to me that these settings are not persisted across BIOS updates.

The strange thing is that this piece of crap also crashed once while tinkering with the BIOS. The mouse would move but clicking had no effect, until the whole BIOS interface just locked up.

---

At this point, I've given up on this machine and I'm done with all AMD related experimentation in my life.
I just want a system that is more powerful than an average laptop and one that works and is reliable. I've been burned bad by AMD here.

I've also written to them asking for a return, which they have refused, no surprises there.
They've offered a warranty replacement CPU, but I'm not sure if that is going to pay off. I live in the EU and the whole shipping back and forth will probably take months if not weeks: and that will be a huge downtime and it will impact my work.
I'd like some advice on whether AMD's return offer is even worth trying.

If not, I'll need to get rid of the motherboard, processor, heatsink, fan and replace it with the equivalent intel gear. This will cost me in the thousands.
You seem stuck that this is definitely an AMD problem. That hasn't been determined yet. What has been determined is this is not the idle bug that many early adopters of the Ryzen systems saw.

This sounds like you have some bad component in your system, but at this point, it's impossible to determine which one is bad. It could be the CPU, motherboard, RAM, PSU, etc. If you look deep enough, you'll find issues with both CPU makers, each motherboard manufacturer, all RAM companies, and every PSU producer. I understand the desire to write off a company after having a single bad experience, even though there's millions of people who haven't had that experience. But make sure it is actually the AMD CPU before you decide to jump ship (even though I do feel AMD provides some of the best bang for the buck). One of my MSI motherboards arrived DOA for my HTPC. I didn't know what the problem was until I did some hardware swapping with my desktop. Once I determined it was the motherboard, I did a quick swap through Amazon and the computer has been running fine ever since.
 
1 members found this post helpful.
Old 05-03-2020, 06:14 AM   #62
asheshambasta
LQ Newbie
 
Registered: Feb 2020
Posts: 21

Rep: Reputation: Disabled
Quote:
Originally Posted by bassmadrigal View Post
You seem stuck that this is definitely an AMD problem. That hasn't been determined yet. What has been determined is this is not the idle bug that many early adopters of the Ryzen systems saw.

This sounds like you have some bad component in your system, but at this point, it's impossible to determine which one is bad. It could be the CPU, motherboard, RAM, PSU, etc. If you look deep enough, you'll find issues with both CPU makers, each motherboard manufacturer, all RAM companies, and every PSU producer. I understand the desire to write off a company after having a single bad experience, even though there's millions of people who haven't had that experience. But make sure it is actually the AMD CPU before you decide to jump ship (even though I do feel AMD provides some of the best bang for the buck). One of my MSI motherboards arrived DOA for my HTPC. I didn't know what the problem was until I did some hardware swapping with my desktop. Once I determined it was the motherboard, I did a quick swap through Amazon and the computer has been running fine ever since.
Yes I'm quite sure this is an AMD problem at this stage. This comes from a combination of repeatedly experiencing these crashes on hardware I've now tested across other machines: the GPU, Memory, SSD, they all seem to be fine.
And even before I tested this: I could see that the frequency of the issues varied with the power saving settings in the BIOS. This, by any means, is definitely related to the Processor.

Needless to say, I've owned something like 8 systems in the past: I used linux on each of them and none of the Intel based systems I've owned have had issues of this nature.

My system stayed stable for weeks and it randomly crashed a while ago: again, while idling. This happened after weeks, and without any changes I made to my BIOS. It just happened out of the blue.

Also, I've made a rather large investment in this machine. I do think being frustrated with how its working and how downright unhelpful AMD's responses have been, is somewhat justified.

Last edited by asheshambasta; 05-03-2020 at 06:15 AM.
 
Old 05-03-2020, 03:04 PM   #63
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 8,792

Rep: Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656
Quote:
Originally Posted by asheshambasta View Post
Also, I've made a rather large investment in this machine. I do think being frustrated with how its working and how downright unhelpful AMD's responses have been, is somewhat justified.
I never said it wasn't justified, but you didn't mention that you had tested all the other components separately. Nor did you mention that you had lockups weeks later with the system idling. I did mention that the bug I have has never lead to an immediate lockup, so if you're still experiencing that, it is likely unrelated to the bug that the rcu-nocbs option fixes.

All I know is my 1800x system has been rock solid stable for years since I added the rcu-nocbs option and I've never needed to add that option to my 2200G htpc system. It seems like I've only seen people with the 1st gen 1000 series processors experience the idle bug.

No matter what brand you use, you're going to find duds. You search on the internet and people have had issues with Intel and AMD. Ford and Chevy. Samsung and Apple. You might've been "lucky" and got a dud of a processor. Is anything I say going to counter your bad experience? No. Of course not. All I know is I got the best bang for the buck with the CPU I have in my system compared to the others available at the time. I have had some frustrating experiences with it (the joys of trying out new hardware on Linux), but I have no complaints now.
 
2 members found this post helpful.
Old 12-19-2020, 09:42 AM   #64
bifferos
Member
 
Registered: Jul 2009
Posts: 401

Original Poster
Rep: Reputation: 149Reputation: 149
New Ryzen system

I recently built a new Ryzen system for my home server:

AMD Ryzen™ 3 3200G w/ RADEON™ RX VEGA 8 Graphics, AM4, Zen+, Quad Core, 4 Thread, 3.6GHz, 4.0GHz Turbo, 4MB, 65W, CPU
ASRock B450M Pro4 AMD Socket AM4 Motherboard
Corsair Vengeance LPX 32GB (2x 16GB) 2666MHz DDR4
1TB Samsung 970 EVO, M.2 (2280) PCIe 3.0 (x4) NVMe SSD, Phoenix, MLC V-NAND, 3400MB/s Read, 2500MB/s Write, 500k/450k
400W Seasonic Platinum 400 Fanless, Full Modular, 80PLUS Platinum, Single Rail, 33A, Fanless, ATX PSU
Phanteks Enthoo Pro Gaming Case - Black

I can't tell you how pleased I am with this combination. It's drawing half what the old Phenom II system was, about 24W and no lock-ups. Some people might like this setup because neither the AMD (Malaysia) nor ASRock (Vietnam) are Chinese made. Whatever.

I don't run Slack direct on this, it's under Proxmox, whether that makes any difference and I've seen no lock-ups so far with a few months of running.

So turning back to my Ryzen 7 1700 system, I decided I'd swap stuff out until the lock-ups subsided. I switched over to MX Linux for my desktop, but the lock-ups persisted (well, I hardly believed it was Slackware doing this anyway, but that's just FYI). I changed the Gigabyte motherboard for the ASRock one above, and played around using different RAM sticks, still got the problem. Ran a few RAM checks, turned up nothing. I also bought a new graphics card, still the problem. All I've got left to swap out is the PSU and the CPU. So now I want to buy another one of the 3200G CPUs as I'm thinking that *must* now be the problem unfortunately the world is out of stock of AMD 3200Gs, and I'm going off the idea of discrete graphics due to both expense and power draw, and I don't play games.

So the next step of the experiment has to wait until after Christmas, but I'm not going to give up, hopefully soon I will confirm I have a duff 1700.
 
2 members found this post helpful.
Old 12-19-2020, 11:20 AM   #65
Timothy Miller
Moderator
 
Registered: Feb 2003
Location: Arizona, USA
Distribution: Debian, EndeavourOS, OpenSUSE, KDE Neon
Posts: 4,005
Blog Entries: 26

Rep: Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521Reputation: 1521
Quote:
Originally Posted by bifferos View Post
I recently built a new Ryzen system for my home server:

AMD Ryzen™ 3 3200G w/ RADEON™ RX VEGA 8 Graphics, AM4, Zen+, Quad Core, 4 Thread, 3.6GHz, 4.0GHz Turbo, 4MB, 65W, CPU
ASRock B450M Pro4 AMD Socket AM4 Motherboard
Corsair Vengeance LPX 32GB (2x 16GB) 2666MHz DDR4
1TB Samsung 970 EVO, M.2 (2280) PCIe 3.0 (x4) NVMe SSD, Phoenix, MLC V-NAND, 3400MB/s Read, 2500MB/s Write, 500k/450k
400W Seasonic Platinum 400 Fanless, Full Modular, 80PLUS Platinum, Single Rail, 33A, Fanless, ATX PSU
Phanteks Enthoo Pro Gaming Case - Black

I can't tell you how pleased I am with this combination. It's drawing half what the old Phenom II system was, about 24W and no lock-ups. Some people might like this setup because neither the AMD (Malaysia) nor ASRock (Vietnam) are Chinese made. Whatever.

I don't run Slack direct on this, it's under Proxmox, whether that makes any difference and I've seen no lock-ups so far with a few months of running.

So turning back to my Ryzen 7 1700 system, I decided I'd swap stuff out until the lock-ups subsided. I switched over to MX Linux for my desktop, but the lock-ups persisted (well, I hardly believed it was Slackware doing this anyway, but that's just FYI). I changed the Gigabyte motherboard for the ASRock one above, and played around using different RAM sticks, still got the problem. Ran a few RAM checks, turned up nothing. I also bought a new graphics card, still the problem. All I've got left to swap out is the PSU and the CPU. So now I want to buy another one of the 3200G CPUs as I'm thinking that *must* now be the problem unfortunately the world is out of stock of AMD 3200Gs, and I'm going off the idea of discrete graphics due to both expense and power draw, and I don't play games.

So the next step of the experiment has to wait until after Christmas, but I'm not going to give up, hopefully soon I will confirm I have a duff 1700.

Wouldn't be at all surprised, the fist gen Zen cores had a fairly high defective rate. I know my 1700x was my issue, soon as I replaced it with a 3700x I've never had a lockup or crash since.
 
1 members found this post helpful.
Old 12-19-2020, 12:02 PM   #66
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 8,792

Rep: Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656
My 1800x has still been rock solid since I passed the rcu-nocbs=0-15 option to the kernel.
 
Old 12-19-2020, 01:01 PM   #67
bsd1101
Member
 
Registered: Jul 2010
Location: Brooklyn NY
Distribution: Slackware 64
Posts: 31

Rep: Reputation: Disabled
GA-AB350N-Gaming with Ryzen 5 1600. DDR4 3200. Had lockups until disabled C-States in bios, but some versions of the bios would not correctly disable C-states; dmesg would display C-State errors on startup. It's why I'm currently running bios version F42d when there are two newer versions.
 
Old 12-19-2020, 01:07 PM   #68
garpu
Senior Member
 
Registered: Oct 2009
Distribution: Slackware
Posts: 1,537

Rep: Reputation: 899Reputation: 899Reputation: 899Reputation: 899Reputation: 899Reputation: 899Reputation: 899
Quote:
Originally Posted by bsd1101 View Post
GA-AB350N-Gaming with Ryzen 5 1600. DDR4 3200. Had lockups until disabled C-States in bios, but some versions of the bios would not correctly disable C-states; dmesg would display C-State errors on startup. It's why I'm currently running bios version F42d when there are two newer versions.
https://github.com/r4m0n/ZenStates-Linux Have you tried this? You have to run it every time you reboot, though.
 
Old 12-19-2020, 01:10 PM   #69
bsd1101
Member
 
Registered: Jul 2010
Location: Brooklyn NY
Distribution: Slackware 64
Posts: 31

Rep: Reputation: Disabled
Quote:
Originally Posted by garpu View Post
https://github.com/r4m0n/ZenStates-Linux Have you tried this? You have to run it every time you reboot, though.
No haven't tried it. Don't even want to fiddle with it at this point, since everything's been running well for a long time.
 
Old 12-19-2020, 02:15 PM   #70
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 8,792

Rep: Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656
Quote:
Originally Posted by bsd1101 View Post
No haven't tried it. Don't even want to fiddle with it at this point, since everything's been running well for a long time.
That's exactly where I'm at. I don't know if the problem was resolved in newer kernels or firmware updates, but having rcu-nocbs passed to the kernel prevented my issue from occurring and I don't want to remove it just to see if the instability comes back.
 
Old 12-19-2020, 03:37 PM   #71
Daedra
Senior Member
 
Registered: Dec 2005
Location: Springfield, MO
Distribution: Slackware64-15.0
Posts: 2,683

Rep: Reputation: 1375Reputation: 1375Reputation: 1375Reputation: 1375Reputation: 1375Reputation: 1375Reputation: 1375Reputation: 1375Reputation: 1375Reputation: 1375
Quote:
Originally Posted by bassmadrigal View Post
My 1800x has still been rock solid since I passed the rcu-nocbs=0-15 option to the kernel.
I also had to add this to my 1600x to make the system stable. On my new 5900x system I haven't had to add anything to keep it stable. I did add "mitigations=off" to the kernel to squeeze out a little extra performance. Yeah I know I leave my self vulnerable, but I am not worried about it on my desktop machine.
 
Old 12-19-2020, 06:08 PM   #72
bifferos
Member
 
Registered: Jul 2009
Posts: 401

Original Poster
Rep: Reputation: 149Reputation: 149
That option was no use for me, but then since the MO of my bug was different it's unsurprising. I just got shitty silicon it seems. And over time the lock-ups have slowly increased in frequency. They jumped from once per week up to approximately once per night when I changed the motherboard. In theory I think the Ryzen boxed CPUs are warrantied for 3 years. If I prove the 1700 is duff with a swap I'm definitely going to try to send it back to AMD as I ordered it in 2018, and never overclocked. I'll let you know how I get on, of course.
 
Old 12-19-2020, 07:03 PM   #73
willysr
Senior Member
 
Registered: Jul 2004
Location: Jogja, Indonesia
Distribution: Slackware-Current
Posts: 4,661

Rep: Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784Reputation: 1784
i had been using all options like rcu and disabling c-states and it worked well on my case
 
1 members found this post helpful.
Old 12-21-2020, 07:00 AM   #74
asheshambasta
LQ Newbie
 
Registered: Feb 2020
Posts: 21

Rep: Reputation: Disabled
Just to add an update from my side on this: the only thing that worked for me was the RMA. My PC retailer was generous enough to send me a replacement CPU so that I could swap out the CPUs with no downtime. Lo and behold, the lockups stopped. The CPU was from the Feb-2020 batch as indicated in on the casing, in contrast to one from 2018 IIRC on the previous CPU.
I've not had a single crash since then; neither do I have many hardware issues (I do have the occasional PCIe link lost for my Ethernet, but that is due to the Motherboard).
I cannot complain. It seems to me that with these new chips, AMD needed some time to sort out some fabrication issues. The one thing I have learnt, however, is to keep away from cutting edge hardware (it was cutting edge when I bought it).
 
Old 12-22-2020, 02:51 AM   #75
cycojesus
Member
 
Registered: Dec 2005
Location: Lyon, France
Distribution: Slackware-current
Posts: 116

Rep: Reputation: 79
Unhappy

I missed this thread before, I built this PC 2 months ago and for some weeks now it became unstable.
Firefox crashes daily, per tab or completely.
I haven't been able to compile a kernel since 5.10-rc1

PCPartPicker Part List
CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
Motherboard: MSI B550-A PRO ATX AM4 Motherboard
Memory: Patriot Viper Steel 64 GB (2 x 32 GB) DDR4-3600 CL18 Memory
Storage: Samsung 970 Evo 1 TB M.2-2280 NVME Solid State Drive
Storage: Western Digital SN750 1 TB M.2-2280 NVME Solid State Drive
Video Card: Sapphire Radeon RX 5500 XT 8 GB PULSE Video Card
Power Supply: SHARKOON WPM Gold Zero 550 W 80+ Gold Certified Semi-modular ATX Power Supply

Reading the last posts here I added
Code:
cu_nocbs=0-15 processor.max_cstate=5
to the kernel cmdline. In the "Bios" I disabled c-state, disabled SMT and set Power Supply Idle Control to Typical...

But the problems persist. Is RMA my only option left ?

EDIT: it just downed on me that rcu_nocbs should be 0-11 in my case if I enable SMT again; or 0-5 without SMT... I'll try that (0-11 + SMT)

EDIT2: yep, nope. Tried to compile a kernel and
Code:
internal compiler error: Segmentation fault

Last edited by cycojesus; 12-22-2020 at 03:38 AM. Reason: facepalm and another test failing
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux Mint 18 keyboard and mouse occasional lockups mazinoz Linux Mint 4 12-31-2016 06:34 PM
system lockups in -current botnet Slackware 25 04-08-2010 01:58 PM
System Lockups carlosinfl Linux - Hardware 2 03-16-2008 09:08 AM
Frequent system lockups pterandon Linux - Newbie 3 08-18-2006 12:54 PM
Dell Latitude D800 occasional system freeze workaround forky Slackware 1 07-30-2004 12:53 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 09:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration