RHEL5.4 software RAID 5 - how do I replace "faulty" drive?
I have a 5 disk RAID 5 with 1 used as a hot spare. As a test, we pulled a drive while doing a 4GB file write to the RAID to simulate a drive failure during use. Everything went perfectly, the write wasn't interrupted and all seemed well, the RAID took about 60 minutes to rebuild on the spare. But when we plugged the original drive back in, it still shows up in /proc/mdstat as being removed:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[1] sdc1[2] sdb1[0]
430115328 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]
unused devices: <none>
and
Version : 0.90
Creation Time : Mon Oct 19 09:36:46 2009
Raid Level : raid5
Array Size : 430115328 (410.19 GiB 440.44 GB)
Used Dev Size : 143371776 (136.73 GiB 146.81 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Oct 21 16:23:23 2009
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : 7d5fd02d:f021a8f7:8d89e144:3ffab5da
Events : 0.570
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 49 1 active sync /dev/sdd1
2 8 33 2 active sync /dev/sdc1
3 0 0 3 removed
The RAID was originally setup with /dev/sd[b-e]1 with /dev/sdf as the hot spare. Is there something I need to do when I replace the "faulty" drive in order to get it to see new drive? This is what I tried:
[root@name-removed ~]# mdadm /dev/md0 -a /dev/sde1
mdadm: add new device failed for /dev/sde1 as 4: Invalid argument
All the forums I found about this message claimed that it was caused by the replacement drive not having enough blocks. This is obviously not the case since I am using the same drive. Here is the partition info:
#cat /proc/partitions
major minor #blocks name
8 0 35548160 sda
8 1 104391 sda1
8 2 4128705 sda2
8 3 31310685 sda3
8 16 143374000 sdb
8 17 143372061 sdb1
8 32 143374000 sdc
8 33 143372061 sdc1
8 48 143374744 sdd
8 49 143372061 sdd1
8 64 143374000 sde
8 65 143372061 sde1
9 0 430115328 md0
The really frustrating part is that my system won't boot while in degraded mode unless I modify the /sys/module/md_mod/parameters/start_dirty_degraded file, which I don't want to have to do.
Is there a step I am missing in replacing a RAID drive?
Thanks
|