mdadm cannot remove failed drive, drive name changed.
Hello everyone, i am setting up a software raid6 for the first time. To test the raid i removed a drive from the array by popping it out of the enclosure. mdadm marked the drive as F and everything seemed well. From what i gather the next step is to remove the drive from the array (mdadm /dev/md0 -r sdf), when i try this i receive the error:
mdadm: cannot find /dev/sdf: No such file or directory
That is true, when i plugged the drive back in the machine now recognizes it as /dev/sdk. My question is how do i remove this non-existent failed drive from my array as i was able to re-add it just fine as /dev/sdk with mdadm /dev/md0 -a /dev/sdk
Also, is there any way to define a drive based on id or something similar to the same drive name to avoid this? Thank you for any help in advance!
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdk sdj sdi sdh sdg sdf(F) sde sdd sdc sdb
13674601472 blocks level 6, 64k chunk, algorithm 2 [9/8] [UUUU_UUUU]
[>....................] recovery = 2.7% (54654548/1953514496) finish=399.9min speed=79132K/sec
mdadm --detail /dev/md0
Version : 00.90
Creation Time : Sat Jan 16 04:54:06 2010
Raid Level : raid6
Array Size : 13674601472 (13041.12 GiB 14002.79 GB)
Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
Raid Devices : 9
Total Devices : 10
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sat Jan 16 22:09:34 2010
State : clean, degraded, recovering
Active Devices : 8
Working Devices : 9
Failed Devices : 1
Spare Devices : 1
Chunk Size : 64K
Rebuild Status : 2% complete
UUID : d3d98db4:55167169:f455fbeb:21592b43 (local to host archive)
Events : 0.10
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 active sync /dev/sde
9 8 160 4 spare rebuilding /dev/sdk
5 8 96 5 active sync /dev/sdg
6 8 112 6 active sync /dev/sdh
7 8 128 7 active sync /dev/sdi
8 8 144 8 active sync /dev/sdj
10 8 80 - faulty spare
I'd guess the problem was physically removing the drive to "fail" it. If it had failed but still been physically present, you probably could have removed it with mdadm -r, then physically removed it. Probably either a physical reboot of the machine or possibly just stopping the array and re-assembling it will solve the problem.
For Posterity: mdadm /dev/md0 -r detached
This is a common issue, and leaving the solution as "well, don't let your drives disappear before removing them" is unfathomable.
To remove the failed and missing drives, don't specify them, use
mdadm /dev/md0 -r detached
Thank you so much for having the CORRECT answer to the problem. I just had a drive die due to total power failure on it and because it is in a hot-swap case, the case thought I pulled it out so it vanished. So this saved me from dismounting the raid and re-assembling it.
THANK YOU SO MUCH!
mdadmin remove failed devices
I'm glad it helps. I know it did for me. It took me tons of digging to figure this out.
Then, I posted it in several places so I could find it again when it happened to me again.
Which I did, and read the answer, and was so happy, then saw I posted it!
Also, I found that you can't hot-add devices to the same /dev/sd## device until you free them this way.
When they are gone, but still locked by MDADM, the device nodes stay in kernel space even though udev devices are gone.
Once you do this, then you can hot-plug a replacement drive and you won't have a gap in drive numbers.
|All times are GMT -5. The time now is 02:02 PM.|