RAID-5 Recovery problems after drive errors
I'm a bit of a Linux newbie, but I did manage to set up the following RAID-5 system:
1x 500GB system drive on ATA IDE 4x 1TB SATA drives in software RAID Linux = Fedora 13 So here's what happened. I set up the system to send me an email every time the mdadm stat file changed, so it would send me emails when in periodically ran a self-test. I was away and noticed that the self-test was going incredibly slow (usually took 8 hours...was on course for taking 16 weeks!) A colleague decided to just reboot the system. Afterwards, the system would not boot and, while all 5 drives were connected, would stop at an endlessly scrolling error message of: Quote:
When attempting to boot to a Live-CD version of Fedora (and Ubuntu) with all 4 RAID drives attached, the same error occurs. With only the other 3 drives attached, it boots into Live-CD Linux just fine. In Palimpsest, it shows the 3 drives as healthy and as parts of a RAID array. However, when I try to start the array through Palimpsest, it says that there are not enough disks to start the array....even though there are 3, which is what RAID 5 was supposed to be about. (The drives contain backups of important research data) So, what do I do now? Do I need to have a 4th blank working drive in there to recover the data? SHOULD it be able to start the array with just the 3 drives, or does it need a blank new drive to rebuild the array? How do I do that? Thanks, Ta-mater |
First that you can do is edit /etc/fstab. Comment out the line that describes /dev/md0 by placing a # in position 1 of that line, to prevent the system mounting it.
After that your system should start without difficulties. When the system is up and running you are able to find out what's up with he drive, and repair your array with mdadm. |
Alright, so I did the above suggestion and was able to boot into Fedora.
Now, I have sent the defective drive back to WD and received the new one today. I have put the new drive in and it boots up (with the /dev/md127 commented out in fstab) So, where do I go from here? I've been searching around for how to replace a drive and rebuild parity in Linux Software RAID 5, but all I've found is confusion and peoples' stories that don't match my situation. There should be a simple process to replacing a failed drive and rebuilding the array in this instance. Can anyone point me in the right direction? |
I found this page describing what I need to do: http://wumple.com/blog/2007/01/23/re...-raid-5-array/
Specifically, he lists the steps as: Quote:
Checking Filesystems... /dev/md127: The Superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem, then the superblock is corrupt, and you might try running e2fsck with an alternate superblock *** An error occurred during the file system check. ***Dropping you to a shell: the system will reboot ***When you leave the shell. And then the filesystem is Read-only. Thing is, the RAID filesystem is ext4, so I'm not sure why it is saying ext2. Do I need to stop this filesystem check from happening automatically?? |
Further, when I boot up Fedora with /dev/md127 commented out, the system does not "start" the array with the existing 3 drives....it says there are not enough components to start the array....which doesn't sound like what is supposed to happen. Shouldn't the system start it as degraded?
I can't add the new disk to the array as stated in the guide above because I can't "start" the array in this state.....bah |
All times are GMT -5. The time now is 10:39 PM. |