LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   mdadm raid6 active despite 3 drive failures (http://www.linuxquestions.org/questions/linux-server-73/mdadm-raid6-active-despite-3-drive-failures-893820/)

roboa1983 07-26-2011 06:55 AM

mdadm raid6 active despite 3 drive failures
 
Hello,
I am currently having problems with my RAID partition. First two disks were having trouble (sde, sdf). Through smartctl I noticed there were some bad blocks, so first I set them to fail, and readded them so that the RAID array will overwrite these.

Since that didn't work, I went ahead and replaced the disks. The recovery process was slow and I left things running overnight. This morning I find out that another disk (sdb) has failed. Strangely enough the array has not become inactive.

md3 : active raid6 sdf1[15](S) sde1[16](S) sdak1[10] sdj1[8] sdk1[9] sdb1[17](F) sdan1[13] sdd1[2] sdc1[1] sdg1[5] sdi1[7] sdal1[11] sdam1[12] sdao1[14] sdh1[6]
25395655168 blocks level 6, 64k chunk, algorithm 2 [15/12] [_UU__UUUUUUUUUU]

Does anyone have any recommendations as the steps to take ahead with regards to recovery/fixing the problem? The disk is basically full so I haven't written anything to disk in the interim of this problem.

Thanks!

chrism01 07-26-2011 10:18 PM

Well there's a good summary of RAID types here https://secure.wikimedia.org/wikipedia/en/wiki/RAID, but basically it says you only need 4 active disks to keep a RAID 6 running.
You seem to have 15(?) total disks, with 2 Syncing and one Failed; just replace the Failed one and continue.
Obviously the less you use the raid, the faster the syncs will complete.

See
Code:

cat /proc/mdstat

mdaddm --detail /dev/md3

Re space full:
when it's finished syncing, you need to do at least 1 of

1. purge some space
2. add more disks
3. backup and replace with something else

roboa1983 07-26-2011 10:34 PM

Chris,
I think that RAID6 has a fault tolerance of two partitions, that's why I'm so worried. Therefore, with two disks turned into 'spares' and another one failing, I don't think there's anymore tolerance for errors.

Cheers,

Roberto


All times are GMT -5. The time now is 10:59 AM.