recovering software raid - disk marked as failed
Hi
I have a 4 disk RAID5 software raid away that i'm desperately trying to get my data from. Of the 4 disks in the array, one is totally wrecked. Of the other three, mdadm picks them up but refuses to activate the array because one of them is marked as faulty. I can't seem to find anywhere that tells mdadm to ignore the faulty status of the disks and mount the array anyway. The only thing I can see that MIGHT do it is a mdadm --build command, but that seems incredibly risky. Any help would be greatly appreciated. -Richard |
Mark the faulty disk as BAD: mdadm /dev/md0 --fault /dev/sda1
Data in array should be accessible since that. To remove the disk from RAID: mdadm /dev/md0 --remove failed Be sure to use proper device names. |
Quote:
The faulty disk is no longer in the array. The disk that i'm having problems with is already marked as faulty. I've tried removing it from the array but this gives me an error : # mdadm /dev/md0 --remove /dev/sda1 mdadm: hot remove failed for /dev/sda1: No such device # even though the device appears in /proc/mdstat and i can see it is definitely there # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : inactive hde1[1] sda1[4](F) sdb1[2] 488390080 blocks unused devices: <none> # mdadm --run /dev/md0 failes as well: # mdadm --run /dev/md0 [ 593.455078] raid5: device hde1 operational as raid disk 1 [ 593.455143] raid5: device sdb1 operational as raid disk 2 [ 593.455202] raid5: not enough operational devices for md0 (2/4 failed) [ 593.455261] RAID5 conf printout: [ 593.455315] --- rd:4 wd:2 [ 593.455370] disk 1, o:1, dev:hde1 [ 593.455425] disk 2, o:1, dev:sdb1 [ 593.455479] raid5: failed to run raid set md0 [ 593.455535] md: pers->run() failed ... mdadm: failed to run array /dev/md0: Input/output error # |
I feel sorry, but
Quote:
RAID5 doesn't work with 2 disks. |
Quote:
Ok, of the 2 disks that are not working, one is completely wrecked. The other is working fine but is marked as failed for some reason How do I remove it from the failed state? is it not just a flag in the superblock? |
##########
|
Could you use the --force option with an mdadm assemble command?
Quote:
|
If i stop the array and try as you suggest, here is what happens:
# mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/hde1 [ 6126.057516] md: md0 stopped. [ 6126.061879] md: bind<sdb1> [ 6126.061966] md: bind<sda1> [ 6126.062378] md: bind<hde1> mdadm: /dev/md0 assembled from 2 drives - not enough to start the array. # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : inactive hde1[1](S) sda1[4](S) sdb1[2](S) 732584256 blocks unused devices: <none> # mdadm --run /dev/md0 [ 6253.994781] raid5: device hde1 operational as raid disk 1 [ 6253.994845] raid5: device sdb1 operational as raid disk 2 [ 6253.994904] raid5: not enough operational devices for md0 (2/4 failed) [ 6253.994964] RAID5 conf printout: [ 6253.995018] --- rd:4 wd:2 [ 6253.995072] disk 1, o:1, dev:hde1 [ 6253.995127] disk 2, o:1, dev:sdb1 [ 6253.995182] raid5: failed to run raid set md0 [ 6253.995238] md: pers->run() failed ... mdadm: failed to run array /dev/md0: Input/output error # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : inactive hde1[1] sda1[4](F) sdb1[2] 488390080 blocks unused devices: <none> # |
All I can think of is running spinrite on sda1. However I don't know if you have a hardware problem or the data on the drive was corrupted. I think that the --force option should cause mdadm to attempt the use the faulty member, in case the drive data isn't faulty.
Maybe the data really is corrupted. |
OK, will try spinrite then. Thanks for the advice.
-Richard |
All times are GMT -5. The time now is 05:11 PM. |