Dual drive failure in RAID 5 (also, RAID 1, and LVM)
I *had* a server with 6 SATA2 drives with CentOS 5.3 on it (I've upgraded over time from 5.1). I had set up (software) RAID1 on /boot for sda1 and sdb1 with sdc1, sdd1, sde1, and sdf1 as hot backups. I created LVM (over RAID5) for /, /var, and /home. I had a drive fail last year (sda).
After a fashion, I was able to get it working again with sda removed. Since I had two hot spares on my RAID5/LVM deal, I never replaced sda. Of course, on reboot, what was sdb became sda, sdc became sdb, etc. So, recently, the new sdc died. The hot spare took over, and I was humming along. A week later (before I had a chance to replace the spares, another died (sdb). Now, I have 3 good drives, my array has degraded, but it's been running (until I just shut it down to tr y. I now only have one replacement drive (it will take a week or two to get the others). My questions/problems are: I went to linux rescue from the CentOS 5.2 DVD and changed sda1 to a Linux (as opposed to Linux RAID) partition. I need to change my fstab to look for /dev/sda1 as boot, but I can't even mount sda1 as /boot. What do I need to do next? If I try to reboot without the disk, I get insmod: error inserting '/lib/raid456.ko': -1 File exists Also, my md1 and md2 fail because there are not enough discs (it says 2/4 failed). I *believe* that this is because sda, sdb, sdc, sdd, and sde WERE the drives on the raid before, and I removed sdb and sdc, but now, I do not have sde (because I only have 4 drives) and sdd is the new drive. Do I need to label these drives and try again? Suggestions? (I suspect I should have done this BEFORE failure). Do I need to rebuild the RAIDs somehow? What about LVM? Any suggestions welcome. Thank you! |
Update
Update:
I still have this issue with a kernel panic after the insmode error (insmod: error inserting '/lib/raid456.ko': -1 File exists)--I get this line twice. I've been able to find some instructions stating that I need to rebuild the initrd with mkinitrd, but when I boot through the rescue disk, I can't seem to mount any of my hard drives (I am sure I am doing something wrong). Can anyone help walk me through this? Thanks! |
You can make a new initrd without using the boot disk.
Login as root in and use xterm. Anyway, you'll probably find /etc/modprobe.conf has multiple entries for that driver. You need to edit that file as root. |
Quote:
The only things I can think of doing at this point are: 1) start over and install CentOS 5.3 (I will only lose some settings, as my /home is backed up) 2) copy the /boot directory to a usb drive, modify it on another machine and then copy it back, and see if that works (any chance? Can I even get the correct initrd on another machine if it's not also 64-bit?) Also, I cannot seem to bring up md1 (/) and md2 (/home), both raid5. Whenever I try, it says that I have 2 drives with 1 spare, so it can't bring up the array! If I can get this working (hints, anyone?), can I just reinstall CentOS to /boot and get it all working again? I think that if I can't get this working tomorrow, I'll have to go with starting from scratch, as I need this server back up (right now, we are all logging into an offsite server, which had been replicated using unison). Thanks for any help. |
Use the
linux rescue mode to start the box, mount the root system (will probably do that for you) then check that file (/etc/modprobe.conf). |
Quote:
mdadm --examine --scan /dev/sda >>/etc/mdadm.conf then add DEVICES partitions to the top and devices=/dev/sda1,/dev/sdb1,/dev/sdc1, missing and run mdadm -A -s I get: mdadm: /dev/md1 assembled from 2 drives and 1 spare - not enough to start the array. Am I dead in the water, then? |
RAID 5 requires a min of 3 active disks, 0 or more spares.
So, you need to set all the disks as active, no spares. That should get you back up and running ... speaking of which, do a backup asap afterwards. |
All times are GMT -5. The time now is 05:44 PM. |