LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   recovering software raid - disk marked as failed (https://www.linuxquestions.org/questions/linux-general-1/recovering-software-raid-disk-marked-as-failed-647932/)

rjstephens 06-09-2008 01:10 AM

recovering software raid - disk marked as failed
 
Hi

I have a 4 disk RAID5 software raid away that i'm desperately trying to get my data from.

Of the 4 disks in the array, one is totally wrecked. Of the other three, mdadm picks them up but refuses to activate the array because one of them is marked as faulty.

I can't seem to find anywhere that tells mdadm to ignore the faulty status of the disks and mount the array anyway. The only thing I can see that MIGHT do it is a mdadm --build command, but that seems incredibly risky.

Any help would be greatly appreciated.

-Richard

Vit77 06-09-2008 01:41 AM

Mark the faulty disk as BAD: mdadm /dev/md0 --fault /dev/sda1
Data in array should be accessible since that.

To remove the disk from RAID: mdadm /dev/md0 --remove failed

Be sure to use proper device names.

rjstephens 06-09-2008 01:56 AM

Quote:

Originally Posted by Vit77 (Post 3178953)
Mark the faulty disk as BAD: mdadm /dev/md0 --fault /dev/sda1
Data in array should be accessible since that.

To remove the disk from RAID: mdadm /dev/md0 --remove failed

Be sure to use proper device names.

I don't see how that would help.
The faulty disk is no longer in the array. The disk that i'm having problems with is already marked as faulty. I've tried removing it from the array but this gives me an error :
# mdadm /dev/md0 --remove /dev/sda1
mdadm: hot remove failed for /dev/sda1: No such device
#


even though the device appears in /proc/mdstat and i can see it is definitely there

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive hde1[1] sda1[4](F) sdb1[2]
488390080 blocks

unused devices: <none>
#

mdadm --run /dev/md0 failes as well:

# mdadm --run /dev/md0
[ 593.455078] raid5: device hde1 operational as raid disk 1
[ 593.455143] raid5: device sdb1 operational as raid disk 2
[ 593.455202] raid5: not enough operational devices for md0 (2/4 failed)
[ 593.455261] RAID5 conf printout:
[ 593.455315] --- rd:4 wd:2
[ 593.455370] disk 1, o:1, dev:hde1
[ 593.455425] disk 2, o:1, dev:sdb1
[ 593.455479] raid5: failed to run raid set md0
[ 593.455535] md: pers->run() failed ...
mdadm: failed to run array /dev/md0: Input/output error
#

Vit77 06-09-2008 02:21 AM

I feel sorry, but
Quote:

Originally Posted by rjstephens (Post 3178969)
2/4 failed
#

You might try to add back fault drives by turns, but I'm afraid it'll not succeed.

RAID5 doesn't work with 2 disks.

rjstephens 06-09-2008 02:39 AM

Quote:

Originally Posted by Vit77 (Post 3178981)
I feel sorry, but

You might try to add back fault drives by turns, but I'm afraid it'll not succeed.

RAID5 doesn't work with 2 disks.

uhh

Ok, of the 2 disks that are not working, one is completely wrecked. The other is working fine but is marked as failed for some reason

How do I remove it from the failed state? is it not just a flag in the superblock?

Vit77 06-09-2008 02:46 AM

##########

jschiwal 06-09-2008 03:02 AM

Could you use the --force option with an mdadm assemble command?

Quote:

Originally Posted by mdadm manpage
MODES
mdadm has several major modes of operation:

Assemble
Assemble the parts of a previously created array into an active
array. Components can be explicitly given or can be searched
for. mdadm checks that the components do form a bona fide
array, and can, on request, fiddle superblock information so as
to assemble a faulty array.


rjstephens 06-09-2008 03:30 AM

If i stop the array and try as you suggest, here is what happens:


# mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/hde1
[ 6126.057516] md: md0 stopped.
[ 6126.061879] md: bind<sdb1>
[ 6126.061966] md: bind<sda1>
[ 6126.062378] md: bind<hde1>
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive hde1[1](S) sda1[4](S) sdb1[2](S)
732584256 blocks

unused devices: <none>
# mdadm --run /dev/md0
[ 6253.994781] raid5: device hde1 operational as raid disk 1
[ 6253.994845] raid5: device sdb1 operational as raid disk 2
[ 6253.994904] raid5: not enough operational devices for md0 (2/4 failed)
[ 6253.994964] RAID5 conf printout:
[ 6253.995018] --- rd:4 wd:2
[ 6253.995072] disk 1, o:1, dev:hde1
[ 6253.995127] disk 2, o:1, dev:sdb1
[ 6253.995182] raid5: failed to run raid set md0
[ 6253.995238] md: pers->run() failed ...
mdadm: failed to run array /dev/md0: Input/output error
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive hde1[1] sda1[4](F) sdb1[2]
488390080 blocks

unused devices: <none>
#

jschiwal 06-10-2008 02:02 AM

All I can think of is running spinrite on sda1. However I don't know if you have a hardware problem or the data on the drive was corrupted. I think that the --force option should cause mdadm to attempt the use the faulty member, in case the drive data isn't faulty.

Maybe the data really is corrupted.

rjstephens 06-10-2008 03:29 AM

OK, will try spinrite then. Thanks for the advice.

-Richard


All times are GMT -5. The time now is 05:11 PM.