LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   mdadm - removing faulty spare (https://www.linuxquestions.org/questions/linux-server-73/mdadm-removing-faulty-spare-701607/)

carlmarshall 02-02-2009 06:04 AM

mdadm - removing faulty spare
 
Hi,

I've had a failure of one of my HDDs (/dev/sdc) which makes up a few RAID partitions. The hot spare has now cut in, so all is currently safe, but how do I now remove the faulty spare?

mdadm --detail /dev/md1 gives the following:

Version : 00.90.03
Creation Time : Fri May 23 15:37:20 2008
Raid Level : raid5
Array Size : 945312256 (901.52 GiB 968.00 GB)
Device Size : 472656128 (450.76 GiB 484.00 GB)
Raid Devices : 3
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Mon Feb 2 11:52:32 2009
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 256K

UUID : 10644348:012f6764:70879599:7631693d
Events : 0.4896

Number Major Minor RaidDevice State
0 8 5 0 active sync /dev/sda5
1 8 21 1 active sync /dev/sdb5
2 8 53 2 active sync /dev/sdd5

3 8 37 - faulty spare

How can I mark the now faulty spare for removal. man mdadm gives me the line:

mdadm /dev/md1 --fail /dev/sdc5

This fails as it can't see /dev/sdc5

Any ideas?

Carl.

mostlyharmless 02-02-2009 01:37 PM

Quote:

How can I mark the now faulty spare for removal
You don't have to mark it as failed; mdadm already did that.

carlmarshall 02-03-2009 03:18 AM

Don't I have to mark it for removal? e.g.

mdadm /dev/md1 --remove /dev/sdc5

before I can physically remove it.

Carl.

mostlyharmless 02-03-2009 01:51 PM

Probably, but I thought your original question was marking as failed.

Quote:

For Manage mode:
-a, --add

hot-add listed devices.
--re-add
re-add a device that was recently removed from an array.
-r, --remove
remove listed devices. They must not be active. i.e. they should be failed or spare devices.
-f, --fail
mark listed devices as faulty.
--set-faulty
same as --fail.
Each of these options require that the first device list is the array to be acted upon and the remainder are component devices to be added, removed, or marked as fault. Several different operations can be specified for different devices, e.g.
mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1
Each operation applies to all devices listed until the next operations.
If an array is using a write-intent bitmap, then devices which have been removed can be re-added in a way that avoids a full reconstruction but instead just updated the blocks that have changed since the device was removed. For arrays with persistent metadata (superblocks) this is done automatically. For arrays created with --build mdadm needs to be told that this device we removed recently with --re-add.

Devices can only be removed from an array if they are not in active use. i.e. that must be spares or failed devices. To remove an active device, it must be marked as faulty first.


Though since it is the failed disk, I doubt that it would make a difference; after all it's not being used anymore.

carlmarshall 02-04-2009 09:47 AM

Thanks for that, the one remaining question is that the output of

mdadm --detail /dev/md0

gives:

/dev/md0:
Version : 00.90.03
Creation Time : Fri May 23 15:36:56 2008
Raid Level : raid5
Array Size : 8385536 (8.00 GiB 8.59 GB)
Device Size : 4192768 (4.00 GiB 4.29 GB)
Raid Devices : 3
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Mon Nov 3 14:02:47 2008
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 256K

UUID : 5df593b7:b205acc4:57fae03c:ec92ecae
Events : 0.20

Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync

3 8 51 - spare /dev/sdd3

How do I deal with the 3rd disk (Number 2) which used to be /dev/sdc3? I can't mark it as failed nor remove it since I can't specify which element has failed.

/dev/sdc no longer shows as a valid item in /dev

Any ideas?

Carl.

mostlyharmless 02-04-2009 10:02 AM

That's a good question. I would try shutting down the array and remounting it, possibly with the resync option:
Quote:

mdadm -S /dev/md0
mdadm -A --update=resync

carlmarshall 02-04-2009 10:06 AM

Thanks for that, I'll give it a try.

Carl.

spqrusa 04-28-2009 04:31 PM

Quote:

Originally Posted by carlmarshall (Post 3428974)
Hi,

I've had a failure of one of my HDDs (/dev/sdc) which makes up a few RAID partitions. The hot spare has now cut in, so all is currently safe, but how do I now remove the faulty spare?

Any ideas?

Carl.

Hi Carl,

You can remove any faulty or failed drives with :

sudo mdadm --manage /dev/md0 --remove faulty
-- or --
sudo mdadm --manage /dev/md0 --remove failed

This lets mdadm know to deallocate the device space. When you hot-add a new spare drive it should replace the /dev/sd<failed> node. After the hot-add you can:

sudo mdadm --manage /dev/md0 --re-add /dev/sd<failed>

-- a real world case --

sudo mdadm --manage /dev/md0 --re-add /dev/sdc1

Hope that helps,

SPQR


All times are GMT -5. The time now is 09:32 AM.