Why would a failed drive in RAID 1 cause entire system to halt?
Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Why would a failed drive in RAID 1 cause entire system to halt?
I'm a little stumped. I have a Dell PowerEdge R200, running Ubuntu 10.04 (ext4), and an LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08) HBA. The server only has 2 drives, and the RAID controller can only do RAID 0 or RAID 1; we use RAID 1.
A couple of days ago the server went into read-only mode. A reboot and fsck from Knoppix and it came back up. 3 days later, same thing. This time I knew enough to check dmesg and I found a number of these errors:
I installed lsiutil and checked the RAID controller and confirmed 1 RAID volume, 2 physical drives, 1 gone. It was PhysDisk 1, which in the above output is the one that reported the RAID status change.
My confusion is this: it's my understanding that RAID 1 is supposed to protect against this kind of all out failure. PhysDisk 1 fails, PhysDisk 2 has all the same info and takes over, allowing you to replace the failed disk and rebuild with no downtime. Maybe a little slowdown in performance at worse. Why would a failed/failing disk cause the system to go read-only, and continue to revert to read-only after some time? I can only guess that either both drives happened to be going bad at the same time, or that the actual controller card was the issue. Is there any utility that gives you the health of the controller itself? Would lsiutil have reported an issue with the controller?
Distribution: Ubuntu 11.4,DD-WRT micro plus ssh,lfs-6.6,Fedora 15,Fedora 16
Posts: 3,233
Rep:
it depends I suppose on how the raid is set up to begin with
perhaps the system is dropping into read-only to prevent the drives from getting too far out of sync?
however I would backup your data, hold your breath,replace the failing drive and hopefully the working drive is successfully mirrored onto the new drive,
remember, mirrored raids are designed to provide redundancy and fault tolerance, not provide a substitute for regular backups.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.