LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   mdadm cannot remove failed drive, drive name changed. (http://www.linuxquestions.org/questions/linux-newbie-8/mdadm-cannot-remove-failed-drive-drive-name-changed-782827/)

touser 01-17-2010 05:22 PM

mdadm cannot remove failed drive, drive name changed.
 
Hello everyone, i am setting up a software raid6 for the first time. To test the raid i removed a drive from the array by popping it out of the enclosure. mdadm marked the drive as F and everything seemed well. From what i gather the next step is to remove the drive from the array (mdadm /dev/md0 -r sdf), when i try this i receive the error:
mdadm: cannot find /dev/sdf: No such file or directory

That is true, when i plugged the drive back in the machine now recognizes it as /dev/sdk. My question is how do i remove this non-existent failed drive from my array as i was able to re-add it just fine as /dev/sdk with mdadm /dev/md0 -a /dev/sdk

Also, is there any way to define a drive based on id or something similar to the same drive name to avoid this? Thank you for any help in advance!

Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[10](F) sde[3] sdd[2] sdc[1] sdb[0]
13674601472 blocks level 6, 64k chunk, algorithm 2 [9/8] [UUUU_UUUU]
[>....................] recovery = 2.7% (54654548/1953514496) finish=399.9min speed=79132K/sec

mdadm --detail /dev/md0
/dev/md0:
Version : 00.90
Creation Time : Sat Jan 16 04:54:06 2010
Raid Level : raid6
Array Size : 13674601472 (13041.12 GiB 14002.79 GB)
Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
Raid Devices : 9
Total Devices : 10
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sat Jan 16 22:09:34 2010
State : clean, degraded, recovering
Active Devices : 8
Working Devices : 9
Failed Devices : 1
Spare Devices : 1

Chunk Size : 64K

Rebuild Status : 2% complete

UUID : d3d98db4:55167169:f455fbeb:21592b43 (local to host archive)
Events : 0.10

Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 active sync /dev/sde
9 8 160 4 spare rebuilding /dev/sdk
5 8 96 5 active sync /dev/sdg
6 8 112 6 active sync /dev/sdh
7 8 128 7 active sync /dev/sdi
8 8 144 8 active sync /dev/sdj

10 8 80 - faulty spare

mostlyharmless 01-24-2010 09:18 AM

I'd guess the problem was physically removing the drive to "fail" it. If it had failed but still been physically present, you probably could have removed it with mdadm -r, then physically removed it. Probably either a physical reboot of the machine or possibly just stopping the array and re-assembling it will solve the problem.

xaminmo 01-04-2011 02:33 PM

For Posterity: mdadm /dev/md0 -r detached
 
This is a common issue, and leaving the solution as "well, don't let your drives disappear before removing them" is unfathomable.

To remove the failed and missing drives, don't specify them, use
mdadm /dev/md0 -r detached

Code:

/bin/bash# mdadm /dev/md2 -r detached
mdadm: hot removed 8:19 from /dev/md2
mdadm: hot removed 8:35 from /dev/md2
[root@ns1:/root]
/bin/bash# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
md2 : active raid6 sdg3[5](S) sdf3[6] sdd3[0] sda3[1] sde3[3]
      5753928192 blocks level 6, 512k chunk, algorithm 2 [5/3] [UU_U_]
      [=========>...........]  recovery = 46.7% (897177416/1917976064) finish=343.0min speed=49593K/sec

md0 : active raid1 sdg1[1] sdf1[0] sde1[4] sdd1[3] sda1[2]
      264960 blocks [5/5] [UUUUU]

md1 : active raid6 sdg2[4] sdf2[3] sda2[0] sde2[2] sdd2[1]
      105810432 blocks level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

This removes the detached devices which are no longer on the system. Arguably, this might be desired BEFORE readding the devices, as you might be able to use --re-add and save a rebuild.

schworak 08-25-2012 10:25 PM

Quote:

Originally Posted by xaminmo (Post 4213015)
This is a common issue, and leaving the solution as "well, don't let your drives disappear before removing them" is unfathomable.

To remove the failed and missing drives, don't specify them, use
mdadm /dev/md0 -r detached


YOU ROCK!

Thank you so much for having the CORRECT answer to the problem. I just had a drive die due to total power failure on it and because it is in a hot-swap case, the case thought I pulled it out so it vanished. So this saved me from dismounting the raid and re-assembling it.

THANK YOU SO MUCH!

xaminmo 08-26-2012 07:39 PM

mdadmin remove failed devices
 
I'm glad it helps. I know it did for me. It took me tons of digging to figure this out.
Then, I posted it in several places so I could find it again when it happened to me again.
Which I did, and read the answer, and was so happy, then saw I posted it!

Also, I found that you can't hot-add devices to the same /dev/sd## device until you free them this way.
When they are gone, but still locked by MDADM, the device nodes stay in kernel space even though udev devices are gone.
Once you do this, then you can hot-plug a replacement drive and you won't have a gap in drive numbers.


All times are GMT -5. The time now is 04:01 AM.