SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I had several crashes this weekend.
After the first crash, I disabled the C-states in the BIOS. The system had trouble being put to suspend, but it would ultimately work. It seemed stable. I then added the rcu kernel flags, which only have effect if the kernel is built with related options. That also didn't help.
Even with the C-states set to off in my BIOS, I had another crash yesterday.
I went tinkering with the BIOS again. I set the idle control to "Typical current idle", since it had been mysteriously set to "auto" after my last BIOS update. It seems to me that these settings are not persisted across BIOS updates.
The strange thing is that this piece of crap also crashed once while tinkering with the BIOS. The mouse would move but clicking had no effect, until the whole BIOS interface just locked up.
---
At this point, I've given up on this machine and I'm done with all AMD related experimentation in my life.
I just want a system that is more powerful than an average laptop and one that works and is reliable. I've been burned bad by AMD here.
I've also written to them asking for a return, which they have refused, no surprises there.
They've offered a warranty replacement CPU, but I'm not sure if that is going to pay off. I live in the EU and the whole shipping back and forth will probably take months if not weeks: and that will be a huge downtime and it will impact my work.
I'd like some advice on whether AMD's return offer is even worth trying.
If not, I'll need to get rid of the motherboard, processor, heatsink, fan and replace it with the equivalent intel gear. This will cost me in the thousands.
You seem stuck that this is definitely an AMD problem. That hasn't been determined yet. What has been determined is this is not the idle bug that many early adopters of the Ryzen systems saw.
This sounds like you have some bad component in your system, but at this point, it's impossible to determine which one is bad. It could be the CPU, motherboard, RAM, PSU, etc. If you look deep enough, you'll find issues with both CPU makers, each motherboard manufacturer, all RAM companies, and every PSU producer. I understand the desire to write off a company after having a single bad experience, even though there's millions of people who haven't had that experience. But make sure it is actually the AMD CPU before you decide to jump ship (even though I do feel AMD provides some of the best bang for the buck). One of my MSI motherboards arrived DOA for my HTPC. I didn't know what the problem was until I did some hardware swapping with my desktop. Once I determined it was the motherboard, I did a quick swap through Amazon and the computer has been running fine ever since.
You seem stuck that this is definitely an AMD problem. That hasn't been determined yet. What has been determined is this is not the idle bug that many early adopters of the Ryzen systems saw.
This sounds like you have some bad component in your system, but at this point, it's impossible to determine which one is bad. It could be the CPU, motherboard, RAM, PSU, etc. If you look deep enough, you'll find issues with both CPU makers, each motherboard manufacturer, all RAM companies, and every PSU producer. I understand the desire to write off a company after having a single bad experience, even though there's millions of people who haven't had that experience. But make sure it is actually the AMD CPU before you decide to jump ship (even though I do feel AMD provides some of the best bang for the buck). One of my MSI motherboards arrived DOA for my HTPC. I didn't know what the problem was until I did some hardware swapping with my desktop. Once I determined it was the motherboard, I did a quick swap through Amazon and the computer has been running fine ever since.
Yes I'm quite sure this is an AMD problem at this stage. This comes from a combination of repeatedly experiencing these crashes on hardware I've now tested across other machines: the GPU, Memory, SSD, they all seem to be fine.
And even before I tested this: I could see that the frequency of the issues varied with the power saving settings in the BIOS. This, by any means, is definitely related to the Processor.
Needless to say, I've owned something like 8 systems in the past: I used linux on each of them and none of the Intel based systems I've owned have had issues of this nature.
My system stayed stable for weeks and it randomly crashed a while ago: again, while idling. This happened after weeks, and without any changes I made to my BIOS. It just happened out of the blue.
Also, I've made a rather large investment in this machine. I do think being frustrated with how its working and how downright unhelpful AMD's responses have been, is somewhat justified.
Last edited by asheshambasta; 05-03-2020 at 06:15 AM.
Also, I've made a rather large investment in this machine. I do think being frustrated with how its working and how downright unhelpful AMD's responses have been, is somewhat justified.
I never said it wasn't justified, but you didn't mention that you had tested all the other components separately. Nor did you mention that you had lockups weeks later with the system idling. I did mention that the bug I have has never lead to an immediate lockup, so if you're still experiencing that, it is likely unrelated to the bug that the rcu-nocbs option fixes.
All I know is my 1800x system has been rock solid stable for years since I added the rcu-nocbs option and I've never needed to add that option to my 2200G htpc system. It seems like I've only seen people with the 1st gen 1000 series processors experience the idle bug.
No matter what brand you use, you're going to find duds. You search on the internet and people have had issues with Intel and AMD. Ford and Chevy. Samsung and Apple. You might've been "lucky" and got a dud of a processor. Is anything I say going to counter your bad experience? No. Of course not. All I know is I got the best bang for the buck with the CPU I have in my system compared to the others available at the time. I have had some frustrating experiences with it (the joys of trying out new hardware on Linux), but I have no complaints now.
I can't tell you how pleased I am with this combination. It's drawing half what the old Phenom II system was, about 24W and no lock-ups. Some people might like this setup because neither the AMD (Malaysia) nor ASRock (Vietnam) are Chinese made. Whatever.
I don't run Slack direct on this, it's under Proxmox, whether that makes any difference and I've seen no lock-ups so far with a few months of running.
So turning back to my Ryzen 7 1700 system, I decided I'd swap stuff out until the lock-ups subsided. I switched over to MX Linux for my desktop, but the lock-ups persisted (well, I hardly believed it was Slackware doing this anyway, but that's just FYI). I changed the Gigabyte motherboard for the ASRock one above, and played around using different RAM sticks, still got the problem. Ran a few RAM checks, turned up nothing. I also bought a new graphics card, still the problem. All I've got left to swap out is the PSU and the CPU. So now I want to buy another one of the 3200G CPUs as I'm thinking that *must* now be the problem unfortunately the world is out of stock of AMD 3200Gs, and I'm going off the idea of discrete graphics due to both expense and power draw, and I don't play games.
So the next step of the experiment has to wait until after Christmas, but I'm not going to give up, hopefully soon I will confirm I have a duff 1700.
I can't tell you how pleased I am with this combination. It's drawing half what the old Phenom II system was, about 24W and no lock-ups. Some people might like this setup because neither the AMD (Malaysia) nor ASRock (Vietnam) are Chinese made. Whatever.
I don't run Slack direct on this, it's under Proxmox, whether that makes any difference and I've seen no lock-ups so far with a few months of running.
So turning back to my Ryzen 7 1700 system, I decided I'd swap stuff out until the lock-ups subsided. I switched over to MX Linux for my desktop, but the lock-ups persisted (well, I hardly believed it was Slackware doing this anyway, but that's just FYI). I changed the Gigabyte motherboard for the ASRock one above, and played around using different RAM sticks, still got the problem. Ran a few RAM checks, turned up nothing. I also bought a new graphics card, still the problem. All I've got left to swap out is the PSU and the CPU. So now I want to buy another one of the 3200G CPUs as I'm thinking that *must* now be the problem unfortunately the world is out of stock of AMD 3200Gs, and I'm going off the idea of discrete graphics due to both expense and power draw, and I don't play games.
So the next step of the experiment has to wait until after Christmas, but I'm not going to give up, hopefully soon I will confirm I have a duff 1700.
Wouldn't be at all surprised, the fist gen Zen cores had a fairly high defective rate. I know my 1700x was my issue, soon as I replaced it with a 3700x I've never had a lockup or crash since.
GA-AB350N-Gaming with Ryzen 5 1600. DDR4 3200. Had lockups until disabled C-States in bios, but some versions of the bios would not correctly disable C-states; dmesg would display C-State errors on startup. It's why I'm currently running bios version F42d when there are two newer versions.
GA-AB350N-Gaming with Ryzen 5 1600. DDR4 3200. Had lockups until disabled C-States in bios, but some versions of the bios would not correctly disable C-states; dmesg would display C-State errors on startup. It's why I'm currently running bios version F42d when there are two newer versions.
No haven't tried it. Don't even want to fiddle with it at this point, since everything's been running well for a long time.
That's exactly where I'm at. I don't know if the problem was resolved in newer kernels or firmware updates, but having rcu-nocbs passed to the kernel prevented my issue from occurring and I don't want to remove it just to see if the instability comes back.
My 1800x has still been rock solid since I passed the rcu-nocbs=0-15 option to the kernel.
I also had to add this to my 1600x to make the system stable. On my new 5900x system I haven't had to add anything to keep it stable. I did add "mitigations=off" to the kernel to squeeze out a little extra performance. Yeah I know I leave my self vulnerable, but I am not worried about it on my desktop machine.
That option was no use for me, but then since the MO of my bug was different it's unsurprising. I just got shitty silicon it seems. And over time the lock-ups have slowly increased in frequency. They jumped from once per week up to approximately once per night when I changed the motherboard. In theory I think the Ryzen boxed CPUs are warrantied for 3 years. If I prove the 1700 is duff with a swap I'm definitely going to try to send it back to AMD as I ordered it in 2018, and never overclocked. I'll let you know how I get on, of course.
Just to add an update from my side on this: the only thing that worked for me was the RMA. My PC retailer was generous enough to send me a replacement CPU so that I could swap out the CPUs with no downtime. Lo and behold, the lockups stopped. The CPU was from the Feb-2020 batch as indicated in on the casing, in contrast to one from 2018 IIRC on the previous CPU.
I've not had a single crash since then; neither do I have many hardware issues (I do have the occasional PCIe link lost for my Ethernet, but that is due to the Motherboard).
I cannot complain. It seems to me that with these new chips, AMD needed some time to sort out some fabrication issues. The one thing I have learnt, however, is to keep away from cutting edge hardware (it was cutting edge when I bought it).
I missed this thread before, I built this PC 2 months ago and for some weeks now it became unstable.
Firefox crashes daily, per tab or completely.
I haven't been able to compile a kernel since 5.10-rc1
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.