mdadm - re-added disk treated as spare
I recently moved a 4 disk RAID-5 to a new machine. The array had 3 SATA disks and 1 IDE, and as i was planning to replace the IDE disk with an SATA one I just moved the 3 SATA disks and added the new disk later. The array assembled OK and began to rebuild. Unfortunately, during the rebuild one of the original drives had an unrecoverable read error and was kicked out of the array. I checked it with smartctl and as it showed only 2 errors within the past month I just removed it and re-added it to the array. However, mdadm added it to the array as a spare and did not rebuild. I stopped the array and attempted to reassemble but was not successful:
Code:
edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d,e}1 sdd was the original device that failed sde is the new device that was not finished building when the array went down As I figured sde was certainly not clean, I zeroed it's superblock and attempted to start the array with b,c and d to pull off the most important data before I tried anything else. Code:
edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d}1 Code:
edge ~ # mdadm -E /dev/sd{b,c,d}1 I'm not very experienced with mdadm and so am very hesitate to try any --force or --assume-clean options. Is there any other way to tell it that sdd drive shouldn't be a spare? |
Try
mdadm --fail /dev/md0 /dev/sdd1 madam --remove /dev/md0 /dev/sdd1 mdadm --add /dev/md0 /dev/sdd1 |
Code:
edge ~ # mdadm --fail /dev/md0 /dev/sdd1 Code:
edge ~ # mdadm --detail /dev/md0 |
Maybe because it's not active. You can still try the remove option.
|
I'm able to remove it, but whenever I --add or --re-add it sdd still shows up as a spare.
|
Its possible that this msg
Quote:
Then try try the --add or --assemble again. |
Code:
mdadm.conf: I noticed that only sdd sees itself as a spare, the other 2 working devices see it as faulty removed: Code:
Number Major Minor RaidDevice State At this point I'm thinking about just wiping the superblocks and having mdadm re-create them. Code:
mdadm --create /dev/md0 --level=5 --num-devices=4 --layout=left-symmetric --chunk-size=64 --assume-clean /dev/sd[bcd]1 missing |
Any solution?
Hey wingnut64,
I have the exact same problem as you do - one drive unnecessarily dropped by the software, another drive failed during the resync. (However, my problem came by just upgrading the OS, not evening moving the array.) Like you, I have valuable data on it, and I believe the disks contain it, just that mdadm doesn't let me get to it. Did you solve it? How? I'm also scared about the risk of re-creating the array. I read other people's forum posts, and they lost all their data by trying this. I found one happy outcome, on Gentoo Forums: He purchased a new disk, then re-created the array. Apparently, he was lucky and got the right order. I don't want to take that chance. I don't play lottery, this is exactly why I run RAID5. |
Wow, it's been 4 years and over 9,000 views...
Unfortunately no, I was not able to rebuild the array. It's been so long I forget what I did but I didn't get anything off it. Fortunately I was able to recover a decent chunk of the original data from backups and other systems. Most of the rest was replaceable or re creatable. This actually kind of scared me away from mdadm and my NAS setup is now using ZFS on Solaris :) If you have extra disks or free space somewhere, you could try dd'ing the individual raw disks in your array to other physical disks or files then loop mounting them (http://www.linuxquestions.org/questi...images-715343/ (disclaimer, i've not tried this). Then you could try potentially dangerous commands on a copy of the raid, possibly on another system. For the benefit of those who might stumble on this in the future, some miscellaneous thoughts on software RAID:
|
I have add the exact same problem.
on a NAS (linux povered) w RAID5 x 4 disks. I have had a failed disk (sdc) , and another that gived smart alert (sdb) ... we have change sdc, an a technician have by mistake removed the sdb disk while reconstructing. that crashed the filesystem, and umounted the volume. He had put it back immediately but the evil was in. if I "add" again the removed disk it showed as spare ... showing md2 as sda[3] sdb[3]S sdd[3] and a missing, so with not enought disk to run I have solved it with re-creating the raid. Code:
mdadm --stop /dev/md2 later on, I had added the replaced disk and it rebuilt correctly Code:
mdadm --add /dev/md2 /dev/sdc3 hope this help. |
Quote:
... :( No data after rebuilding. |
Many Thanks!
Quote:
Code:
mdadm --stop /dev/md2 |
Like the last YzRacer in 2015 I only signed up to say that this post saved my Synology raid. The "--assume-clean" option did the trick for me.
My Synology raid was in "Clean, FAILED" and re-building the array using the above command worked like a charm. Now fsck is running and so far no bad blocks. Thanks! |
\o/
you're welcome. |
All times are GMT -5. The time now is 04:54 AM. |