LinuxQuestions.org - Dual drive failure in RAID 5 (also, RAID 1, and LVM)

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - Dual drive failure in RAID 5 (also, RAID 1, and LVM) (https://www.linuxquestions.org/questions/linux-server-73/dual-drive-failure-in-raid-5-also-raid-1-and-lvm-727881/)

Dual drive failure in RAID 5 (also, RAID 1, and LVM)

I *had* a server with 6 SATA2 drives with CentOS 5.3 on it (I've upgraded over time from 5.1). I had set up (software) RAID1 on /boot for sda1 and sdb1 with sdc1, sdd1, sde1, and sdf1 as hot backups. I created LVM (over RAID5) for /, /var, and /home. I had a drive fail last year (sda).

After a fashion, I was able to get it working again with sda removed. Since I had two hot spares on my RAID5/LVM deal, I never replaced sda. Of course, on reboot, what was sdb became sda, sdc became sdb, etc.

So, recently, the new sdc died. The hot spare took over, and I was humming along. A week later (before I had a chance to replace the spares, another died (sdb).

Now, I have 3 good drives, my array has degraded, but it's been running (until I just shut it down to tr y.

I now only have one replacement drive (it will take a week or two to get the others).

My questions/problems are:
I went to linux rescue from the CentOS 5.2 DVD and changed sda1 to a Linux (as opposed to Linux RAID) partition. I need to change my fstab to look for /dev/sda1 as boot, but I can't even mount sda1 as /boot. What do I need to do next? If I try to reboot without the disk, I get insmod: error inserting '/lib/raid456.ko': -1 File exists

Also, my md1 and md2 fail because there are not enough discs (it says 2/4 failed). I *believe* that this is because sda, sdb, sdc, sdd, and sde WERE the drives on the raid before, and I removed sdb and sdc, but now, I do not have sde (because I only have 4 drives) and sdd is the new drive. Do I need to label these drives and try again? Suggestions? (I suspect I should have done this BEFORE failure).

Do I need to rebuild the RAIDs somehow? What about LVM?

Any suggestions welcome.

Thank you!

Update:
I still have this issue with a kernel panic after the insmode error (insmod: error inserting '/lib/raid456.ko': -1 File exists)--I get this line twice.

I've been able to find some instructions stating that I need to rebuild the initrd with mkinitrd, but when I boot through the rescue disk, I can't seem to mount any of my hard drives (I am sure I am doing something wrong).

Can anyone help walk me through this?

Thanks!

You can make a new initrd without using the boot disk.
Login as root in and use xterm.

Anyway, you'll probably find /etc/modprobe.conf has multiple entries for that driver. You need to edit that file as root.

Quote:

Originally Posted by chrism01 (Post 3553803)

I can't login, my raid crashed, and I can't bring the computer up. I *can* bring up md0 (/boot, raid1) under linux rescue, but I can't seem to modify anything, as I don't have mkinitrd available to me.

The only things I can think of doing at this point are:
1) start over and install CentOS 5.3 (I will only lose some settings, as my /home is backed up)
2) copy the /boot directory to a usb drive, modify it on another machine and then copy it back, and see if that works (any chance? Can I even get the correct initrd on another machine if it's not also 64-bit?)

Also, I cannot seem to bring up md1 (/) and md2 (/home), both raid5. Whenever I try, it says that I have 2 drives with 1 spare, so it can't bring up the array! If I can get this working (hints, anyone?), can I just reinstall CentOS to /boot and get it all working again?

I think that if I can't get this working tomorrow, I'll have to go with starting from scratch, as I need this server back up (right now, we are all logging into an offsite server, which had been replicated using unison).

Thanks for any help.

Use the
linux rescue
mode to start the box, mount the root system (will probably do that for you) then check that file (/etc/modprobe.conf).

Quote:

Originally Posted by chrism01 (Post 3553903)

Use the
linux rescue
mode to start the box, mount the root system (will probably do that for you) then check that file (/etc/modprobe.conf).

I don't think I can do that. I cannot mount the root filesystem, as it is in md1 (a raid5 array that failed). I cannot seem to get md1 back up. When I try, using:
mdadm --examine --scan /dev/sda >>/etc/mdadm.conf
then add DEVICES partitions to the top and devices=/dev/sda1,/dev/sdb1,/dev/sdc1, missing

and run mdadm -A -s
I get:
mdadm: /dev/md1 assembled from 2 drives and 1 spare - not enough to start the array.

Am I dead in the water, then?

RAID 5 requires a min of 3 active disks, 0 or more spares.
So, you need to set all the disks as active, no spares. That should get you back up and running ... speaking of which, do a backup asap afterwards.