LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   RHEL5.4 software RAID 5 - how do I replace "faulty" drive? (https://www.linuxquestions.org/questions/linux-server-73/rhel5-4-software-raid-5-how-do-i-replace-faulty-drive-763774/)

cgande1x 10-22-2009 12:43 PM

RHEL5.4 software RAID 5 - how do I replace "faulty" drive?
 
I have a 5 disk RAID 5 with 1 used as a hot spare. As a test, we pulled a drive while doing a 4GB file write to the RAID to simulate a drive failure during use. Everything went perfectly, the write wasn't interrupted and all seemed well, the RAID took about 60 minutes to rebuild on the spare. But when we plugged the original drive back in, it still shows up in /proc/mdstat as being removed:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[1] sdc1[2] sdb1[0]
430115328 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]

unused devices: <none>

and

Version : 0.90
Creation Time : Mon Oct 19 09:36:46 2009
Raid Level : raid5
Array Size : 430115328 (410.19 GiB 440.44 GB)
Used Dev Size : 143371776 (136.73 GiB 146.81 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Oct 21 16:23:23 2009
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 256K

UUID : 7d5fd02d:f021a8f7:8d89e144:3ffab5da
Events : 0.570

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 49 1 active sync /dev/sdd1
2 8 33 2 active sync /dev/sdc1
3 0 0 3 removed

The RAID was originally setup with /dev/sd[b-e]1 with /dev/sdf as the hot spare. Is there something I need to do when I replace the "faulty" drive in order to get it to see new drive? This is what I tried:

[root@name-removed ~]# mdadm /dev/md0 -a /dev/sde1
mdadm: add new device failed for /dev/sde1 as 4: Invalid argument

All the forums I found about this message claimed that it was caused by the replacement drive not having enough blocks. This is obviously not the case since I am using the same drive. Here is the partition info:

#cat /proc/partitions
major minor #blocks name

8 0 35548160 sda
8 1 104391 sda1
8 2 4128705 sda2
8 3 31310685 sda3
8 16 143374000 sdb
8 17 143372061 sdb1
8 32 143374000 sdc
8 33 143372061 sdc1
8 48 143374744 sdd
8 49 143372061 sdd1
8 64 143374000 sde
8 65 143372061 sde1
9 0 430115328 md0

The really frustrating part is that my system won't boot while in degraded mode unless I modify the /sys/module/md_mod/parameters/start_dirty_degraded file, which I don't want to have to do.

Is there a step I am missing in replacing a RAID drive?

Thanks

cgande1x 10-23-2009 10:23 AM

I'm not sure what the problem was and the system needed to go into production 3 days ago, so I stopped and deleted the array and recreated it and now it seems to be working fine. Good thing we hadn't used the array for storage yet. Disk Druid built the RAID5 with the drives in a different order than mdadm did the second time, that is the only difference I can see. It seems to be happy for now.

Thanks anyways.

cgande1x 10-23-2009 10:27 AM

Oh, and another difference between the Disk Druid and mdadm created arrays:

Disk Druid RAID 5 size: 378GB
mdadm size (same drives): 404GB


All times are GMT -5. The time now is 07:38 PM.