Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
12-31-2021, 01:58 AM
|
#1
|
Member
Registered: Apr 2005
Location: Cambridge, UK
Distribution: KDE Neon, Proxmox
Posts: 37
Rep:
|
Ubuntu 21.10 Western Digital & Samsung nvme unreliability on H310 chipset
Hardware: Intel “Coffee Lake” (8th Gen) i7-8700 @ 3.2 GHz
Asustek Prime H310M-A R2.0 motherboard
Corsair 32 GB DDR4 2166 MHz RAM
WD 1TB 750 (firmware 111130WD) & 850 (firmware 614900WD) nvme disks
Samsung 1TB SSD 980 nvme disk (Firmware 1B4QFX07)
Samsung Evo 1TB SATA disk
Software: Kubuntu (Ubuntu 21.10 & KDE/Plasma desktop), Linux 5.13.0
ZFS filesystem for root
I have been chasing down unreliability of my Kubuntu install for almost a month. My nvme drives are all unreliable (pci io errors reported), but an attached SATA SSD is reliable. At least one of the nvme drives is solidly reliable running Windows 10.
Things I tried, without resolving the issue:
1. Setting nvme_core.default_ps_max_latency_us to 0, 6000, 12000 in the kernel command line
2. Setting pcie_aspm=off in the kernel command line
3. Updating BIOS to latest version
4. Updating WD disks to latest firmware version
5. Running ext4 rather than zfs on root.
6. Zorin 16 (Ubuntu 20.04-based) and Debian 11 (Bullseye) distros
I couldn’t find online similar reports specific to the H310 chipset, but I suspect this to be the cause somehow, or perhaps my motherboard is faulty. I have a replacement motherboard with a better chipset courtesy of Aliexpress on its way to me. I’ll post an update when that arrives.
I wanted to keep this post short, but I also wanted to give more detail about what I did and what I observed. You can read some more detail in my related blog post https://www.linuxquestions.org/quest...chipset-38727/.
Greetings from unreliable Cambridge, UK
Adrian
|
|
|
12-31-2021, 06:09 PM
|
#2
|
LQ Guru
Registered: Aug 2016
Location: SE USA
Distribution: openSUSE 24/7; Debian, Knoppix, Mageia, Fedora, OS/2, others
Posts: 6,425
|
I wouldn't hastily blame the H310 before ruling out Asus' BIOS. I have its B560M-A and an 11th gen i5 that has never booted past the loading of the i915 graphics driver if I have more than one connected display. If I wish use of more than one display, I must either limit performance by disabling KMS, or boot first, then attach the other display cables. Simply powering the displays down before attempting to boot does not help. I reported a bug on gitlab.freedesktop.org which generated a comment that blames the Asus BIOS. Meanwhile I've been going back & forth with Asus' least competent support staff asking for information I gave them originally, and telling me to try things I already tried. I'm betting the BIOS needs fixing.
|
|
|
01-01-2022, 01:19 AM
|
#3
|
Member
Registered: Apr 2005
Location: Cambridge, UK
Distribution: KDE Neon, Proxmox
Posts: 37
Original Poster
Rep:
|
Quote:
Originally Posted by mrmazda
I wouldn't hastily blame the H310 before ruling out Asus' BIOS. I have its B560M-A and an 11th gen i5 that has never booted past the loading of the i915 graphics driver if I have more than one connected display. If I wish use of more than one display, I must either limit performance by disabling KMS, or boot first, then attach the other display cables. Simply powering the displays down before attempting to boot does not help. I reported a bug on gitlab.freedesktop.org which generated a comment that blames the Asus BIOS. Meanwhile I've been going back & forth with Asus' least competent support staff asking for information I gave them originally, and telling me to try things I already tried. I'm betting the BIOS needs fixing.
|
Thank you, Mr Mazda, for your reply. Yes, it could be a problem in the BIOS. But I can't imagine what kind of problem that would be that would enable Windows 10 to run fine, but would single out linux. Yes, the board would be developed to run Windows 10, so you'd expect it to work reliably. Perhaps there's some hardware feature Windows 10 is not using that Linux is.
I have no real way of localizing the issue except by swapping out components.
Regards,
Adrian
|
|
|
01-02-2022, 01:27 AM
|
#4
|
LQ Guru
Registered: Aug 2016
Location: SE USA
Distribution: openSUSE 24/7; Debian, Knoppix, Mageia, Fedora, OS/2, others
Posts: 6,425
|
Manufacturers work with M$ to have the required drivers in place when a product is released. A choice may be made to "solve" with a driver change a problem the BIOS causes, to kludge instead of a providing the best fix.
|
|
|
01-02-2022, 03:11 AM
|
#5
|
Member
Registered: Apr 2005
Location: Cambridge, UK
Distribution: KDE Neon, Proxmox
Posts: 37
Original Poster
Rep:
|
At the moment a combination of nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off on the command line appears to have resolved this issue. I don't know why it has fixed it now, as I'm pretty sure I tried this combination some time ago without success.
I did place an order for a motherboard, with a better chipset on it, so hopefully I can avoid the workaround when it arrives. I'll post an update with my findings.
And, Mr Mazda, I certainly agree that hardware/bios bugs might be resolved in the driver. I used to write firmware for the 802.11 chips our company designed. You would not believe some of the ugly kludges/fixes/workarounds we had to use to avoid re-spinning the hardware.
|
|
|
01-02-2022, 01:37 PM
|
#6
|
Member
Registered: Jun 2020
Posts: 610
Rep: 
|
Just a thought/comment: you are aware that Samsung SSDs have known, documented, and pervasive flaws in their proprietary controllers that Samsung has consistently failed (refused?)* to fix, which leads to incompatibilities/instabilities in both linux and macOS, right? Specifically TRIM functionality is more or less broken, and the presence of the Samsung controllers can also lead to IO stalls. This has been reported as far back as the -40 series Samsung drives, and is/was still an issue with the -70 series drives, and I'd imagine that hasn't improved with the -80 series. These issues do not seem to be reported (or reported as significantly) in Windows, and my own (admittedly 'quick and dirty') testing supports that - the two Samsung -70 series drives I have will work just fine in Windows 7x64, but will both cause system-breaking instability in various linux distros (I haven't felt like testing this on the Mac, but the opencore documentation points to similar issues with Samsung drives as has been document for linux, and TRIM is blacklisted by macOS for Samsung drives).
If you remove the Samsung drives, does everything tidy up? If so, I'd blame them as the known problem child and replace them with something that isn't a known problem child - more 'generic' devices (that often use Phison or Silicon Motion (SMI) controllers) tend to have no problems.
* Why 'refused'? Early on, with the initial reports on the 840, it seemed that Samsung was open to the issue being a firmware problem, but as the issues were reported with later generations they appear to have changed their tactic to just declaring linux broken.
EDIT
Here's an example article I found from a quick search: https://www.neowin.net/news/linux-pa...d-amd-systems/
I know there are some Bugzilla threads about this from over the years, and I've both experienced (and seen this documented/discussed) on the NVMe (900 series) models as well, but this was what I could find with a quick search. From first-hand experience I spent the better part of 3 months chasing random lockups, hangs, file-system gremlins, and so forth across multiple distros, motherboards, CPUs, memory, etc ('lots of hardware' was involved suffice to say), before finally turning on the supposedly 'gold standard' Samsung SSDs - tear those out, and everything went back to work...they work just dandy in a Windows box though.
Last edited by obobskivich; 01-02-2022 at 01:43 PM.
|
|
|
01-03-2022, 12:31 AM
|
#7
|
Member
Registered: Apr 2005
Location: Cambridge, UK
Distribution: KDE Neon, Proxmox
Posts: 37
Original Poster
Rep:
|
Quote:
Originally Posted by obobskivich
If you remove the Samsung drives, does everything tidy up?
|
Thank you for your comment.
No. I saw the fault first on the WD Black nvmes. I tried two different types and two sizes. I thought to get a Samsung to compare to, as it was at the top of a random list of "nvme drives recommended for linux" I found. The WD Blacks were third on that list.
I do have a workaround, as I noted yesterday.
|
|
|
01-03-2022, 02:05 PM
|
#8
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,428
|
What I felt a sense of dejà vue. Back a few decades, I had a Via chipset with the infamous "Hardware fault."
In fact, it wasn't a hardware fault. There's a whole pile of settings the motherboard supplies to the BIOS for correct operation; Via had tweaked these so it could use the Creative Soundblaster piece of junk in m$ windoze. Hence the problem, which was fixable using a utility and instructions from Via to tweak them back. "This setting reads 60; adjust it to 40" sort of thing.
Too late for Via, of course. They bit the dust or got sold for small money. It could well be something similar in the drive, firmware, or anywhere. Why do you think I got out of servicing Electronics?
Last edited by business_kid; 01-03-2022 at 02:07 PM.
|
|
|
01-03-2022, 02:18 PM
|
#9
|
Moderator
Registered: Feb 2003
Location: Arizona, USA
Distribution: Debian, EndeavourOS, OpenSUSE, KDE Neon
Posts: 4,028
|
Quote:
Originally Posted by business_kid
What I felt a sense of dejà vue. Back a few decades, I had a Via chipset with the infamous "Hardware fault."
In fact, it wasn't a hardware fault. There's a whole pile of settings the motherboard supplies to the BIOS for correct operation; Via had tweaked these so it could use the Creative Soundblaster piece of junk in m$ windoze. Hence the problem, which was fixable using a utility and instructions from Via to tweak them back. "This setting reads 60; adjust it to 40" sort of thing.
Too late for Via, of course. They bit the dust or got sold for small money. It could well be something similar in the drive, firmware, or anywhere. Why do you think I got out of servicing Electronics?
|
Not related to the OP, but relevant to your post, Via actually is still around. They just a couple months ago completed a deal that sold off their Centaur Technologies (CPU Design) employees to Intel. They also have sold off all their production capability in the US (also to Intel if I recall but not sure), but are still actively developing and producing silicon in conjunction with Zhaoxin in China.
|
|
|
01-04-2022, 06:21 AM
|
#10
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,428
|
Quote:
Originally Posted by Timothy Miller
Not related to the OP, but relevant to your post, Via actually is still around.
|
Interesting. My last dealing with them was a usb issue generating massive log spam. The ehci maintainer knew of it, but had nothing to go on. So I found some Via forum and went OTT in sounding off, looking for notice. In typical chinese fashion, my post was instantly moved, and I was assigned to some programmer dweeb on a private forum. He wrote and I tested a patch to spit certain registers to the log. I had to patch and compile the bleeding edge kernel, and debug his code. Finally I tested, extracted about a meg of log snippets where I had inserted/removed devices, and sent that off over dialup modem to the ehci_hcd maintainer. At last he had sight of his fault, and patched appropriately. We later found Via's programmer dweeb and I had the same chipset. The 2 ports paying no heed to registers were found by Via, who had disabled them without informing him!
I've since discovered that a lot of these buyouts are for the technical staff - designers/programmers and to get their guys up to speed on the tech they have bought. It must have been the chipset division they sold.
|
|
|
All times are GMT -5. The time now is 08:14 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|