Dual drive failure in RAID 5 (also, RAID 1, and LVM)
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Dual drive failure in RAID 5 (also, RAID 1, and LVM)
I *had* a server with 6 SATA2 drives with CentOS 5.3 on it (I've upgraded over time from 5.1). I had set up (software) RAID1 on /boot for sda1 and sdb1 with sdc1, sdd1, sde1, and sdf1 as hot backups. I created LVM (over RAID5) for /, /var, and /home. I had a drive fail last year (sda).
After a fashion, I was able to get it working again with sda removed. Since I had two hot spares on my RAID5/LVM deal, I never replaced sda. Of course, on reboot, what was sdb became sda, sdc became sdb, etc.
So, recently, the new sdc died. The hot spare took over, and I was humming along. A week later (before I had a chance to replace the spares, another died (sdb).
Now, I have 3 good drives, my array has degraded, but it's been running (until I just shut it down to tr y.
I now only have one replacement drive (it will take a week or two to get the others).
My questions/problems are:
I went to linux rescue from the CentOS 5.2 DVD and changed sda1 to a Linux (as opposed to Linux RAID) partition. I need to change my fstab to look for /dev/sda1 as boot, but I can't even mount sda1 as /boot. What do I need to do next? If I try to reboot without the disk, I get insmod: error inserting '/lib/raid456.ko': -1 File exists
Also, my md1 and md2 fail because there are not enough discs (it says 2/4 failed). I *believe* that this is because sda, sdb, sdc, sdd, and sde WERE the drives on the raid before, and I removed sdb and sdc, but now, I do not have sde (because I only have 4 drives) and sdd is the new drive. Do I need to label these drives and try again? Suggestions? (I suspect I should have done this BEFORE failure).
Do I need to rebuild the RAIDs somehow? What about LVM?
Update:
I still have this issue with a kernel panic after the insmode error (insmod: error inserting '/lib/raid456.ko': -1 File exists)--I get this line twice.
I've been able to find some instructions stating that I need to rebuild the initrd with mkinitrd, but when I boot through the rescue disk, I can't seem to mount any of my hard drives (I am sure I am doing something wrong).
You can make a new initrd without using the boot disk.
Login as root in and use xterm.
Anyway, you'll probably find /etc/modprobe.conf has multiple entries for that driver. You need to edit that file as root.
I can't login, my raid crashed, and I can't bring the computer up. I *can* bring up md0 (/boot, raid1) under linux rescue, but I can't seem to modify anything, as I don't have mkinitrd available to me.
The only things I can think of doing at this point are:
1) start over and install CentOS 5.3 (I will only lose some settings, as my /home is backed up)
2) copy the /boot directory to a usb drive, modify it on another machine and then copy it back, and see if that works (any chance? Can I even get the correct initrd on another machine if it's not also 64-bit?)
Also, I cannot seem to bring up md1 (/) and md2 (/home), both raid5. Whenever I try, it says that I have 2 drives with 1 spare, so it can't bring up the array! If I can get this working (hints, anyone?), can I just reinstall CentOS to /boot and get it all working again?
I think that if I can't get this working tomorrow, I'll have to go with starting from scratch, as I need this server back up (right now, we are all logging into an offsite server, which had been replicated using unison).
Use the
linux rescue
mode to start the box, mount the root system (will probably do that for you) then check that file (/etc/modprobe.conf).
I don't think I can do that. I cannot mount the root filesystem, as it is in md1 (a raid5 array that failed). I cannot seem to get md1 back up. When I try, using:
mdadm --examine --scan /dev/sda >>/etc/mdadm.conf
then add DEVICES partitions to the top and devices=/dev/sda1,/dev/sdb1,/dev/sdc1, missing
and run mdadm -A -s
I get:
mdadm: /dev/md1 assembled from 2 drives and 1 spare - not enough to start the array.
RAID 5 requires a min of 3 active disks, 0 or more spares.
So, you need to set all the disks as active, no spares. That should get you back up and running ... speaking of which, do a backup asap afterwards.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.