Drives dropping out of mdadm RAID10 randomly on boot

davediehose · 04-11-2013, 08:44 AM

Hello fellow Linuxers,

I have a problem with my mdadm RAID10 which I am running on a machine with OpenSUSE 12.3. It appeared today, apparently after a normal reboot.

On boot, I see behavior similar to this:

Code:

[    2.572122] md: md0 stopped.
[    2.588542] md: bind<sdb1>
[    2.603699] md: bind<sdd1>
[    2.624639] md: bind<sde1>
[    2.624665] md: could not open unknown-block(8,33).
[    2.624666] md: md_import_device returned -16
[    2.624692] md: kicking non-fresh sde1 from array!
[    2.624695] md: unbind<sde1>
[    2.635518] md: export_rdev(sde1)
[    2.635542] md: kicking non-fresh sdb1 from array!
[    2.635546] md: unbind<sdb1>
[    2.641204] md: export_rdev(sdb1)
[    2.642475] md: raid10 personality registered for level 10
[    2.642933] md/raid10:md0: not enough operational mirrors.
[    2.642947] md: pers->run() failed ...

I say similar, because I have seen different drives and even different numbers of drives drop from the array. The dropping out seems unnecessary, because I can re-add the missing drives to the array and it doesn't even rebuild most of the time (it only did so once):

Code:

[  304.380667] md: bind<sdb1>
[  304.407601] md: recovery of RAID array md0
[  304.407607] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  304.407609] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  304.407615] md: using 128k window, over a total of 1610611456k.
[  305.313459] md: md0: recovery done.
[  307.552017] md: bind<sde1>
[  307.579897] md: recovery of RAID array md0
[  307.579903] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  307.579905] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  307.579910] md: using 128k window, over a total of 1610611456k.
[  308.509127] md: md0: recovery done.

If one of the two mirrors is completely dropped, I have to stop the array and restart it, where it restarts with two disks (one in each mirror) and I can then add the two other drives.
Once it is up and running, the array works, but on a reboot I see the same problem again every time.

The RAID10 partition doesn't take up all of the space on the drives, I also run a RAID1 and a RAID0 on them, which both work without problem on every boot. This leads me to assume that there isn't an actual drive failure, because even the RAID0 works, which should be the most vulnerable to every hardware crisis. When I fix the RAID10 by hand, all RAIDs look good on paper:

Code:

 cat /proc/mdstat
Personalities : [raid10] [raid0] [raid1]
md0 : active raid10 sde1[3] sdb1[0] sdc1[1] sdd1[2]
      3221222912 blocks super 1.0 256K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/24 pages [0KB], 65536KB chunk

md2 : active raid1 sdb3[0] sde3[3] sdd3[2] sdc3[1]
      157284224 blocks super 1.0 [4/4] [UUUU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

md1 : active raid0 sdb2[0] sde2[3] sdd2[2] sdc2[1]
      524295936 blocks super 1.0 64k chunks

BTW, I use GPT on all of the disks, my system is an UEFI one, but I run my OS on compatibility with GRUB, not on EFI boot or EFI-GRUB or anything. I don't think this could cause any problems, or does it? It hasn't before this situation.

Out of lack of ideas, I have just started a filesystem badsector search on the LVM volumes on the RAID10, with the intention to find out whether there are actual bad blocks. However, I rather suspect something to be wrong on the mdadm/superblock level, but I am not that experienced there.

As I can not point to any particular cause or even fix for this behavior, I would greatly appreciate your help. Whatever additional information you need, I will provide it.

Regards,
dave

smallpond · 04-11-2013, 12:42 PM

The error -16 is -EBUSY. md_import_drive is failing when it tries to get exclusive ownership of the drives but they are in use by some other program. Check for any programs running before the md errors appear in the logs that may be using the disks.

davediehose · 04-11-2013, 01:35 PM

Thanks for the response. I looked into the logs and found some stuff, even from some days ago, so the problem existed unnoticed by me for some time now. Creepy.

It most probably is some race condition or a process polling/accessing the drives earlier than it's supposed to. I have lines saying this:

Code:

2013-04-01T22:54:06.102933+02:00 davederserver boot.md[356]: Starting MD RAID mdadm: failed to add /dev/sdd1 to /dev/md/0: Device or resource busy
2013-04-01T22:54:06.102938+02:00 davederserver boot.md[356]: mdadm: failed to add /dev/sdb1 to /dev/md/0: Device or resource busy
2013-04-01T22:54:06.102942+02:00 davederserver boot.md[356]: mdadm: /dev/md/0 has been started with 2 drives (out of 4).

Nothing around those jumped at me that would busy the drives, though. Doesn't mean there isn't anything, of course. Is there an elegant way to delay the md stuff on boot? It takes place right in between so much seemingly unrelated stuff.

When my fsck finishes, I will next try booting without md assemble at boot (kernel raid=noautomount should do this, I assume) to see how it goes when I do everything by hand from the start.

smallpond · 04-11-2013, 01:52 PM

Only thing that might conflict that early would be udev. See if udev is maybe doing something funny with the disks.

Also check /etc/mdadm.conf and make sure you aren't assembling the same arrays twice.

davediehose · 04-11-2013, 02:23 PM

Ok, situation changed somewhat. I now saw three clean assembles out of three reboots, one of them a cold boot. Thing is, I can't figure out why it works.

The only thing fsck found was two cases of too high directory depth on an inode. Could hardly have been the problem, I guess.

I disabled the startup of some services through chkconfig (nfsserver, libvirtd), maybe that helped. I also inserted raid=noautomount in the kernel options. This didn't disable md, as I would have expected, but maybe it changed a significant detail in boot not visible to me (?).

Anyway, I now want to manually create the arrays later from a script. That'd be a good way to work around this kind of problems in the future, and I have to mount LUKS manually anyways. After the kernel options didn't work as expected, I just tried to disable boot.md to keep md from assembling my arrays, but it still did it anyway. Could you help me out on the best way forward here?

smallpond · 04-11-2013, 04:06 PM

Put in /etc/mdadm.conf:

Code:

AUTO -all

Then it should not assemble any arrays automatically.

davediehose · 04-12-2013, 04:58 AM

Thanks for the info. So catch-"all" with minus without anything else in the config means do not assemble anything. Now the mdadm.conf manual makes sense to me