LinuxQuestions.org - mdadm RAID 5 single drive failure

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - mdadm RAID 5 single drive failure (https://www.linuxquestions.org/questions/linux-server-73/mdadm-raid-5-single-drive-failure-644325/)

mdadm RAID 5 single drive failure

Last night we had an issue where we thought one of the drives was bad in our 3 drive RAID 5 created using mdadm. Luckily the drive was okay. However, in the mean time we spent a good amount of time trying to figure out how one would recover from a single drive failure in this situation using mdadm. We searched this forum and did not seem to find any successful situations where someone recovered from a single drive failure using RAID 5 with mdadm. My question is has anyone been successful in recovering from a single drive failure using an mdadm RAID 5? If so, what was the procedure? This is unfortunately an issue we never considered when creating our RAID using mdadm. If recovery is not truly possible, we will go with a hardware RAID. Thanks for your time!

-Andrew

Sorry to hear about your experience Andrew!

Clearly, mdadm is not an actual RAID tool, as the RAID acronym contains the word 'redundant' and mdadm is incapable of recovering from an actual failure of a drive. All you can hope for is that the drive(s) that are marked as 'faulty' are not truly faulty ... mdadm only thinks they are. Then the trick is to force mdadm into deciding the disks aren't faulty after all and adding them back into the array.

In the event of an actual hardware failure, I would recommend pulling the faulty drives and replacing the remaining drives as well (they are sure to follow!). Sure, you lose your RAID, but at least you can be confident that your next mdadm RAID will have the best chance at a long, fruitful life before encountering any hiccups that render it completely useless.

Good luck!

Quote:

Then the trick is to force mdadm into deciding the disks aren't faulty after all and adding them back into the array.

So if that happen, we should take the mdadm down, put the drive back and reassemble?
Last time, it happened, i took it out and put it back in, using mdadm --add, it does not pick up the drive. it still thought that the drive was bad,

Quote:

Originally Posted by ufmale (Post 3162712)

First you should inspect the drive's md superblock using mdadm --examine to make sure that the superblocks match on each of the devices (even the supposedly faulty one).

If everything matches, what has worked for me in the past is using:

mdadm --assemble --force /dev/md0 <device 1> <device 2> ... <device N>

This should force mdadm to remove the faulty flag from the device. If the device is truly bad, then it still won't work, but if it was marked faulty due to power issues or a hardware controller failure then it should bring the RAID back up.

Quote:

Originally Posted by atarghe1 (Post 3162636)

... has anyone been successful in recovering from a single drive failure using an mdadm RAID 5? If so, what was the procedure?

Hi Andrew,

Recently went through the same thing here, a drive in our Raid-5 array was marked faulty. Further investigation revealed the drive was really quite dead. Although Google wasn't very helpful, the procedure turned out to be rather straight forward in the end.

1. Use "mdadm --manage /dev/md0 -r /dev/sdd" to remove the drive that was marked as faulty from the array.

2. Power down and replace the drive with a good drive.

3. Power up and set the partition table on the new drive to match those of the other drives in the array. Here we used "sfdisk -d /dev/sda | sfdisk /dev/sdd".

4. Add the proper partition on the new drive into the array, "mdadm --manage /dev/md0 -a /dev/sdd2"

5. Sit back and wait for the recovery to happen, you can "cat /proc/mdstat" to watch its progress; you should see something like:

Personalities : [raid5]
md0 : active raid5 sdd2[4] sdc2[2] sdb2[1] sda2[0]
731985408 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]
[===>.................] recovery = 19.7% (48253056/243995136) finish=59.1min speed=55184K/sec

You can get more detailed instructions for Raid 1 here:

www dot howtoforge dot com/replacing_hard_disks_in_a_raid1_array

Essentially the same steps as for Raid 5 which worked here.

Hope this helps,
Sean

Much appreciated. That was exactly what I was interested in. Thanks for taking the time to post that information. I imagine that will help many others in this situation!

I had a drive in my 4 drive raid5 array fry a couple nights ago. I searched google and came across this helpful information which also worked for me. Thanks for the detailed howto.

Thank you for this answer tux68!

Haven't had a drive "dropout" for ages, so I had forgotten how to do this...
This made my day!