mdadm is not rebuilding! Seems hung. How to restart??

stj5353 · 03-06-2012, 10:49 AM

Ok. So I have a hotplug backplane and mdadm managing 3 drives in a R5 array.

All works fine 50% of the time. When it fails it needs a reboot and I don't know why!

If I pull a drive and re-insert, it detects and starts rebuilding, but most times it get's hung:

root@iomega-array1:/etc# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md2 : active raid5 sdd2[1](F) sde2[2] sdc2[0]
1911562240 blocks super 1.0 level 5, 512k chunk, algorithm 2 [3/2] [U_U]
[>....................] recovery = 1.8% (17464448/955781120) finish=29315.0min speed=533K/sec

md1 : active raid1 sda2[0] sdb2[1]
467405488 blocks super 1.0 [2/2] [UU]

md0 : active raid1 sdd1[3](F) sda1[0] sde1[4] sdc1[2] sdb1[1]
20980816 blocks super 1.0 [5/4] [UUU_U]
resync=DELAYED

unused devices: <none>
root@iomega-array1:/etc#

It will stay at 1.8% until I reboot.

mdadm shows that the array is in rebuild, but its hung. It willnot move.

root@iomega-array1:/etc# mdadm --detail /dev/sdb
mdadm: /dev/sdb does not appear to be an md device
root@iomega-array1:/etc# mdadm --detail /dev/md2
/dev/md2:
Version : 1.00
Creation Time : Mon Mar 5 18:03:59 2012
Raid Level : raid5
Array Size : 1911562240 (1823.01 GiB 1957.44 GB)
Used Dev Size : 955781120 (911.50 GiB 978.72 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

Update Time : Tue Mar 6 10:04:05 2012
State : active, degraded, recovering
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Rebuild Status : 1% complete

Name : iomega-array1:2 (local to host iomega-array1)
UUID : 9264acfb:af15b0a8:70a55835:3baea6ae
Events : 24

Number Major Minor RaidDevice State
0 8 34 0 active sync /dev/sdc2
1 8 50 1 faulty spare rebuilding /dev/sdd2
2 8 66 2 active sync /dev/sde2
root@iomega-array1:/etc#

I have tried all of these to force a rebuld but failed!

root@iomega-array1:/etc# mdadm --fail /dev/sdd2 /dev/md2
mdadm: error opening /dev/sdd2: No such device or address
root@iomega-array1:/etc# mdadm --remove /dev/md2 /dev/sdd2
mdadm: hot remove failed for /dev/sdd2: Device or resource busy
root@iomega-array1:/etc# mdadm --fail /dev/md2 /dev/sdd2
mdadm: set /dev/sdd2 faulty in /dev/md2
root@iomega-array1:/etc# mdadm --remove /dev/md2 /dev/sdd2
mdadm: hot remove failed for /dev/sdd2: Device or resource busy
root@iomega-array1:/etc# mdadm --stop /dev/md2
mdadm: failed to stop array /dev/md2: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
root@iomega-array1:/etc#

I don't get what gives. How in the world can I initiate what would be normally initiated on a reboot? It will recover on reboot. I just don't want to bring down my other arrays when I do a drive swap.

Thanks for any insight.

ba.page · 03-06-2012, 01:40 PM

given that /dev/sdd is also in use in your /dev/md0 array, taking this drive out WILL take degrade both arrays.
that said, I would personally not try to have a drive be a member of multiple device arrays.
also, seeing as how this happens often and hangs, maybe you just have a bad drive?

moving on, try this:
# mdadm /dev/md2 -f /dev/sdd2
# mdadm /dev/md2 -r /dev/sdd2
# mdadm --zero-superblock /dev/sdd2
# mdadm /dev/md2 -a /dev/sdd2

stj5353 · 03-06-2012, 01:48 PM

Oh I'm 100% on board with you WRT multiple partitions in arrays. The problem is that this is a SAN array from iomega, and they have a requirement to create md0 across all drives. I don't know why. It seems crazy to me, but they say it's just for storage and backup of their firmware and should not cause many (if any) IOs during normal operation.

Since I use their GUI to create and manage the arrays, I'm using mdadm via the backend to see whats going on.

mdadm, is still mdadm. Their GUI is just a wrapper. So I'm debugging why drives get stuck in rebuilding since iomega support is not really primed to work low level issues out with customers. They would rather you return the box than solve why this issue is happening. Acutally, they would be happier to just reboot, but that's not something I'm willing to accept...

But.. Digressing...

I've tested the drive in an ubuntu box and it seems fine and dandy. Can't really tie this problem to a drive since it pretty much happens on a rebuld, not necessarily a failure of a given drive.

I already tried those suggestions, just using the long switch. Force didn't seem to help.
root@iomega-array1:/etc# mdadm --fail /dev/sdd2 /dev/md2
mdadm: error opening /dev/sdd2: No such device or address
root@iomega-array1:/etc# mdadm --remove /dev/md2 /dev/sdd2
mdadm: hot remove failed for /dev/sdd2: Device or resource busy
root@iomega-array1:/etc# mdadm --fail /dev/md2 /dev/sdd2
mdadm: set /dev/sdd2 faulty in /dev/md2
root@iomega-array1:/etc# mdadm --remove /dev/md2 /dev/sdd2
mdadm: hot remove failed for /dev/sdd2: Device or resource busy
root@iomega-array1:/etc# mdadm --stop /dev/md2
mdadm: failed to stop array /dev/md2: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
root@iomega-array1:/etc#

ba.page · 03-06-2012, 01:59 PM

your problem is syntax

"# mdadm --fail /dev/sdd2 /dev/md2" is not a valid command
syntax should be: mdadm <raiddevice> [options] <component-devices>
(see man mdadm for details)

please try my suggestions explicitly:
# mdadm /dev/md2 -f /dev/sdd2
# mdadm /dev/md2 -r /dev/sdd2
# mdadm --zero-superblock /dev/sdd2
# mdadm /dev/md2 -a /dev/sdd2

Atari911 · 03-14-2012, 10:58 PM

sounds like you may have a process that is using the disk when you are attempting to rebuild the degraded array.