Dual drive failure in RAID 5 (also, RAID 1, and LVM)

ABL · 05-22-2009, 04:45 PM

I *had* a server with 6 SATA2 drives with CentOS 5.3 on it (I've upgraded over time from 5.1). I had set up (software) RAID1 on /boot for sda1 and sdb1 with sdc1, sdd1, sde1, and sdf1 as hot backups. I created LVM (over RAID5) for /, /var, and /home. I had a drive fail last year (sda).

After a fashion, I was able to get it working again with sda removed. Since I had two hot spares on my RAID5/LVM deal, I never replaced sda. Of course, on reboot, what was sdb became sda, sdc became sdb, etc.

So, recently, the new sdc died. The hot spare took over, and I was humming along. A week later (before I had a chance to replace the spares, another died (sdb).

Now, I have 3 good drives, my array has degraded, but it's been running (until I just shut it down to tr y.

I now only have one replacement drive (it will take a week or two to get the others).

My questions/problems are:
I went to linux rescue from the CentOS 5.2 DVD and changed sda1 to a Linux (as opposed to Linux RAID) partition. I need to change my fstab to look for /dev/sda1 as boot, but I can't even mount sda1 as /boot. What do I need to do next? If I try to reboot without the disk, I get insmod: error inserting '/lib/raid456.ko': -1 File exists

Also, my md1 and md2 fail because there are not enough discs (it says 2/4 failed). I *believe* that this is because sda, sdb, sdc, sdd, and sde WERE the drives on the raid before, and I removed sdb and sdc, but now, I do not have sde (because I only have 4 drives) and sdd is the new drive. Do I need to label these drives and try again? Suggestions? (I suspect I should have done this BEFORE failure).

Do I need to rebuild the RAIDs somehow? What about LVM?

Any suggestions welcome.

Thank you!

ABL · 05-26-2009, 12:52 PM

Update:
I still have this issue with a kernel panic after the insmode error (insmod: error inserting '/lib/raid456.ko': -1 File exists)--I get this line twice.

I've been able to find some instructions stating that I need to rebuild the initrd with mkinitrd, but when I boot through the rescue disk, I can't seem to mount any of my hard drives (I am sure I am doing something wrong).

Can anyone help walk me through this?

Thanks!

chrism01 · 05-26-2009, 08:49 PM

You can make a new initrd without using the boot disk.
Login as root in and use xterm.

Anyway, you'll probably find /etc/modprobe.conf has multiple entries for that driver. You need to edit that file as root.

ABL · 05-26-2009, 09:24 PM

Quote:

Originally Posted by chrism01

You can make a new initrd without using the boot disk.
Login as root in and use xterm.

Anyway, you'll probably find /etc/modprobe.conf has multiple entries for that driver. You need to edit that file as root.

I can't login, my raid crashed, and I can't bring the computer up. I *can* bring up md0 (/boot, raid1) under linux rescue, but I can't seem to modify anything, as I don't have mkinitrd available to me.

The only things I can think of doing at this point are:
1) start over and install CentOS 5.3 (I will only lose some settings, as my /home is backed up)
2) copy the /boot directory to a usb drive, modify it on another machine and then copy it back, and see if that works (any chance? Can I even get the correct initrd on another machine if it's not also 64-bit?)

Also, I cannot seem to bring up md1 (/) and md2 (/home), both raid5. Whenever I try, it says that I have 2 drives with 1 spare, so it can't bring up the array! If I can get this working (hints, anyone?), can I just reinstall CentOS to /boot and get it all working again?

I think that if I can't get this working tomorrow, I'll have to go with starting from scratch, as I need this server back up (right now, we are all logging into an offsite server, which had been replicated using unison).

Thanks for any help.

chrism01 · 05-26-2009, 11:41 PM

Use the
linux rescue
mode to start the box, mount the root system (will probably do that for you) then check that file (/etc/modprobe.conf).

ABL · 05-27-2009, 09:21 AM

Quote:

Originally Posted by chrism01

Use the
linux rescue
mode to start the box, mount the root system (will probably do that for you) then check that file (/etc/modprobe.conf).

I don't think I can do that. I cannot mount the root filesystem, as it is in md1 (a raid5 array that failed). I cannot seem to get md1 back up. When I try, using:
mdadm --examine --scan /dev/sda >>/etc/mdadm.conf
then add DEVICES partitions to the top and devices=/dev/sda1,/dev/sdb1,/dev/sdc1, missing

and run mdadm -A -s
I get:
mdadm: /dev/md1 assembled from 2 drives and 1 spare - not enough to start the array.

Am I dead in the water, then?

chrism01 · 05-27-2009, 08:01 PM

RAID 5 requires a min of 3 active disks, 0 or more spares.
So, you need to set all the disks as active, no spares. That should get you back up and running ... speaking of which, do a backup asap afterwards.