Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I recently built myself a new router. I'm using a 60GB Drevo X1 SSD for the OS (Arch). /var, /home and everything else that needs to be written to regularly is on a btrfs RAID1 array with regular HDDs. I've got this weird problem that after a week or two, the SSD stops working. I can see I/O errors (print_req_error: I/O error, dev sda) in the journal after I reboot the router. It seems the system can't access the SSD anymore for some reason. If I push the reset button, the BIOS won't find the SSD anymore. But when I totally it power off and restart it, then the SSD is found again and everything works as expected for a week or two. There doesn't seem to be anything wrong with the smartctl -x output. The HDDs in the btrfs RAID1 array don't have this problem. Not sure what I should try before ordering a new SATA cable.
I can't do that. The OS is on this drive. At that point, when it locks up, anything that isn't in memory or on the RAID array can't be read anymore. I can't even log into it from anywhere since that would mean running /bin/login.
Then Your three fingered salute is probably the way to go until you ready replacement hardware, which I presume you will do.
On a side note, I'm probably old-fashioned but I like to have essential modules for booting (hard disk, motherboard chipset, and filesystem) compiled into the kernel, so an initrd is not needed. It's just one more set of hoops you don't have to jump through.
I'm definitely not ready yet to get any replacement hardware as all the hardware is brand new. I'm probably going to get some more SATA cables and try plugging the SSD into another SATA port to see how that goes if nothing else works. I guess you're thinking that a kernel module (libata and ahci here) dies after a while and isn't reloaded for some reason? But I don't see how that wouldn't also kill access to the btrfs RAID1 array. I know it's still working cause systemd-journald is still happily writing to /var on it and I can still access nginx and NFS shares that serve stuff from it. So I highly doubt that's the problem here. At first I was thinking that the drive might have a problem with fstrim or discard. But fstrim is set up to run every week and I didn't get the problem for over a week. So I don't think it's that. I tried also removing the discard flag in fstab for the UEFI vfat /boot partition. But that didn't help either. What I'd like to know is how to enable more debugging info to be sent to systemd-journald so I'll get more info on what's happening the next time it locks up.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.