[SOLVED] md *does not* continue rebuilding after reboot on Slackware 14.1

gaitos · 09-30-2015, 07:41 AM

Hello,

This happens on Slackware 14.1, Kernel is 3.10.17 stock x86_64 as well as stock i686, mdadm 3.2.6 (tested also with 3.3.4)

(The first issue was noticed on a Xen machine but was reproduced on stock kernel as well as on another machine that has 32bit Slackware installed)

Given a RAID1 array, a device fails and is hot-replaced. The rebuild starts normally. However, if the machine is rebooted before the rebuild is finished, the array no longer appears as degraded,recovering and the data is corrupted (given that one HDD is brand new).

Searching only provided a similar bug from 2012 in Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=817039

The suggestion there was to update mdadm, however mdadm installed in slackware is newer than the one in that bug report. Besides, I don't think the problem to be with mdadm (a user-mode program) but with the md driver in the kernel. However, just to be on the safe side I downloaded and compiled mdadm 3.3.4. The problem persists.

Everything RAID-related was done using only Linux tools (mdadm), i.e. MB BIOS was not configured for RAID. The same problem appears on two different systems (different CPU, MB etc.), so it's unlikely to be a hardware related or compatibility issue.

Is there an option/flag/switch that I am missing, or is a bug somewhere? Besides the kernel and mdadm are there any other components involved in Linux RAID?

Simple steps to reproduce are below. WARNING! /dev/sdb1 and /dev/sdc1 will be ERASED, don't try this unless you know what you are doing!
I used sdb1 and sdc1 as RAID autodetect partitions (0xfd)

Create the array:

Code:

root@nxen:~# mdadm -C /dev/md127 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1

After it has finished the initial resync:

Code:

root@nxen:~# mdadm --manage /dev/md127 --fail /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md127
root@nxen:~# mdadm --manage /dev/md127 --remove failed
mdadm: hot removed 8:33 from /dev/md127
root@nxen:~# mdadm --manage /dev/md127 -a /dev/sdc1
mdadm: added /dev/sdc1

check that rebuild begun:

Code:

root@nxen:~# mdadm --detail /dev/md127
/dev/md127:
         Version : 1.2
   Creation Time : Wed Sep 30 14:48:05 2015
      Raid Level : raid1
      Array Size : 20955136 (19.98 GiB 21.46 GB)
   Used Dev Size : 20955136 (19.98 GiB 21.46 GB)
    Raid Devices : 2
   Total Devices : 2
     Persistence : Superblock is persistent

     Update Time : Wed Sep 30 14:55:52 2015
           State : active, degraded, recovering
  Active Devices : 1
 Working Devices : 2
  Failed Devices : 0
   Spare Devices : 1

  Rebuild Status : 1% complete

            Name : nxen:127  (local to host nxen)
            UUID : 7589d8f8:0d8b5716:06e07bfa:28407522
          Events : 23

     Number   Major   Minor   RaidDevice State
        0       8       17        0      active sync   /dev/sdb1
        2       8       33        1      spare rebuilding   /dev/sdc1

reboot the machine and check again:

Code:

root@nxen:~# mdadm --detail /dev/md127
/dev/md127:
         Version : 1.2
   Creation Time : Wed Sep 30 14:48:05 2015
      Raid Level : raid1
      Array Size : 20955136 (19.98 GiB 21.46 GB)
   Used Dev Size : 20955136 (19.98 GiB 21.46 GB)
    Raid Devices : 2
   Total Devices : 2
     Persistence : Superblock is persistent

     Update Time : Wed Sep 30 14:56:35 2015
           State : clean
  Active Devices : 2
 Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0

            Name : nxen:127  (local to host nxen)
            UUID : 7589d8f8:0d8b5716:06e07bfa:28407522
          Events : 26

     Number   Major   Minor   RaidDevice State
        0       8       17        0      active sync   /dev/sdb1
        2       8       33        1      active sync   /dev/sdc1

It shows as clean instead of rebuilding. Also in the first scenario - where a HDD was actually replaced - the data was corrupted, which is to be expected when the rebuild is considered done but isn't.

Any ideas? Thanks in advance!

wildwizard · 10-04-2015, 01:51 AM

1. Don't use 0xfd
2. Does this only occur within your scenario of the old disk been added back into the array? ie an actual new disk works ok?

gaitos · 10-04-2015, 06:05 AM

Thank you for replying. Unfortunately, it took quite long for my post to be approved and the problem was solved in the meantime with help from another list.

Answers to your observations:
1. I only used 0xfd for testing (figuring it's the typical use scenario); the original problem manifested when using whole drives (/dev/sdb and /dev/sdc)

2. No, the problem first manifested when a brand new disk was introduced into the array. After the reboot (that happened before rebuild was finished) the array was shown as healthy and the new disk showed "active sync". Of course, the array didn't contain the right data. In fact this was quite scary for me. The machine I encountered the problem on was in testing, but I had a similar configuration in production. A HDD replacement followed by an eventual reboot before the rebuild was completed could lead to subtle alteration of data (the worst kind since it can pass undetected for a while). Needless to say, I have reconfigured that server.

I have found two possible solutions:
1. Upgrade to kernel 3.19.8 (the last 3.. kernel) or
2. Create /etc/mdadm.conf using mdadm --examine --scan > /etc/mdadm.conf and pass raid=noautodetect on kernel command line.

I have chosen the 2nd solution (that was suggested on the rlug list - Romanian LUG) because I didn't want to waste a lot of time configuring the newer kernel (e.g. compiled with its defaults it didn't recognize my network cards), and from what I've learned it's safer to have mdadm assembling the array than relying on kernel autodetect.

As far as I understood from the explanations on rlug, the Linux kernel md driver (at least in the 3.10.17 version) only supports version 0.9 of the md superblock. I have not yet verified the kernel changelog to see if my understanding is correct; however, I have empirically determined that the autodetect in the Slackware default huge-3.10.17 kernel will not continue rebuild after reboot.

So, safest solution: create a /etc/mdadm.conf and avoid kernel autodetect with "raid=noautodetect".

bassmadrigal · 10-05-2015, 07:13 AM

Quote:

Originally Posted by gaitos

I have chosen the 2nd solution (that was suggested on the rlug list - Romanian LUG) because I didn't want to waste a lot of time configuring the newer kernel (e.g. compiled with its defaults it didn't recognize my network cards), and from what I've learned it's safer to have mdadm assembling the array than relying on kernel autodetect.

Just as a side note, there shouldn't be any software issues with a stock Slackware if you were to install a 4.x kernel (although, it is possible, though not likely, that 3rd-party software you've installed may have kernel limitations). Pat has a good .config for the 4.1.6 kernel in -current. It should support all the hardware your 3.10.17 does, plus newer stuff.

gaitos · 10-05-2015, 07:30 AM

@bassmadrigal
Thank you for the heads-up, I wasn't aware of that. However, this machine is acting as a Xen host (Dom0). I will test 4.1.6 (modify the base .config and compile it as Xen Dom0) when I get some time but I think the results would be slightly off this topic.

wildwizard · 10-05-2015, 03:44 PM

Option 2 is the correct one though if you didn't use 0xfd you wouldn't need the raid=noautodetect to the kernel as it would not detect the partitions.

I'll be linking back to this thread as proof that 0xfd may cause data loss for the next person that mentions they use 0xfd for the partition type.

I've been warning people away from that for years now but people seem to think they know better than the kernel RAID folks who I've been quoting all that time.

See the following pages :-
https://raid.wiki.kernel.org/index.php/Partition_Types
https://raid.wiki.kernel.org/index.php/RAID_Boot

gaitos · 10-06-2015, 01:29 AM

Yes, that seems a sensible warning. However, be advised that (at least for huge-3.10.17) the kernel will autodetect whole disks as well (no 0xfd partition). I assume that it wouldn't touch 0xda partitions. So IMHO the safe choice (no matter what kernel one is using) is to have a correct /etc/mdadm.conf and have arrays assembled by mdadm (via initrd if root fs is on RAID) and eventually pass "raid=noautodetect" to kernel or avoid autodetection via other means (e.g. 0xda partitions).