[SOLVED] how to reassemble mdadm raid after troubled reboot

choogendyk · 08-16-2017, 11:44 AM

I have a supermicro server running Ubuntu 14.04 with all the latest aptitude updates. It has about 40 drives among the internal and two external cabinets that are assembled into a number of mdadm raid arrays with LVM. The cabinets are SAS multi-path.

The server experienced a panic yesterday morning and halted. The admin on call hit the reset and the system came up, but with issues. Apparently, it failed to get a full inventory of drives before assembling the arrays. One mirror didn't come up, one mirror was missing a drive, one raid6 was missing two drives, one raid5 was missing a drive, and one raid5 was missing two drives and didn't start up.

I have managed most of it. I got one mirror to come up using `sudo mdadm --run /dev/md0`. I got the other mirror and the raids that were running rebuilt with commands like `sudo mdadm --manage /dev/md2 --add /dev/sdk`. Since these arrays were up and running, in degraded mode, the event counts on the drives had diverged from the removed drives and adding the removed drive back in required rebuilding. This worked while the system was running.

The raid that didn't come up consists of three 4TB drives as raid5. It shows two removed. Since the drives were not found on reboot, and the raid array was never started, I'm assuming the event counts on the three drives have not diverged. It seems like I ought to be able to reassemble the array and bring it up without having to do any sort of rebuild. However, the metadata on the drive that came up now lists the other two drives as removed. Thus, `sudo mdadm --detail /dev/md125` shows it as "active, FAILED, Not Started" with two drives removed. So, I'm assuming a reboot would bring it up the same way.

Any ideas how to do this? Would two adds followed by a run do the job? I'm hoping for someone with real experience and not just speculation. The data on the raid is critical.

jefro · 08-16-2017, 08:25 PM

"The raid that didn't come up consists of three 4TB drives as raid5. It shows two removed."

Might have to start by finding more about the condition of the physical drives maybe??

choogendyk · 08-17-2017, 07:04 AM

`sudo smartctl -a /dev/sdq` for each drive says they are good.

`sudo mdadm --examine /dev/sdq1` for each drive shows the UID for the array and an event count that matches the array for all drives.

These are enterprise class drives in a Sun J4500 hanging off a SuperMicro SuperServer, not consumer grade cheap equipment. Every collection of research data is critical to the faculty member whose research depends on the data. We encourage them to have other copies, but that's not always possible when you have many terrabytes of data. I also try to keep tape backups, but the escalation of data over the past few years has been unbelievable. I went from AIT5 to LTO6 and now that's not keeping up. I just got an Overland NEO series T24 with two LTO7 drives, but haven't gotten it in operation yet. We've got on the order of 80TB data in this department, and approaching 100TB data in the other department I take care of. I have mirrors on the root drives and raid6 on the larger arrays. When people need more space, we're currently buying HGST 10TB Helium drives.

syg00 · 08-17-2017, 07:39 AM

Not many are going to be prepared to offer advice that might trash that data.
Your institution is responsible for securing that data, not us. Harsh, but true.

Talk to the people that know this stuff - start here.

jefro · 08-17-2017, 02:52 PM

I like syg00's link better than the one I posted.

choogendyk · 08-17-2017, 03:36 PM

syg00, I appreciate the point. Obviously, I can't hold anyone accountable for free advice given in a public forum. I was hoping that someone might have encountered the situation and could say what worked for them.

Your link to the wiki at kernel.org was very useful. I don't know why that doesn't pop up in google searches.

Their suggestion for similar situations was to issue `sudo mdadm --stop /dev/md125` followed by `sudo --assemble /dev/md125 /dev/sdv1 /dev/sdu1 /dev/sdai1` (substituting my values there). They said that this would do no harm, and that the sequence could be repeated with different parameters. When I did this, I got device busy on the two that I wanted to "add" back in (v1 and u1). I did not want to use `--force`, because that can result in things you don't want to happen.

At this point, I was then looking at the parted manual and partprobe manual to see if there might be something there about the busy status. I had another 10TB raid5 made of two 10TB drives (we start minimal and add on), which had one drive dropped, was running, but had no data yet. I figured I could risk playing with it. It also had the weird anomaly that the partition device, /dev/sdn1, had come up as a character device rather than as a block device. Advice on StackExchange suggested rm'ing /dev/sdn1 and doing `sudo partprobe /dev/sdn` to regenerate it. In my case, that didn't work.

At this point, I realized none of these others are mentioning anything about multi-pathing, but my drive cabinets are multi-pathed. So, each drive shows up twice as, say, /dev/sdq and /dev/sdau. Furthermore, another device, e.g. /dev/dm-21, is created that encompasses those two. Some devices that haven't been put into an array yet show up in /dev/mapper/ with much longer names made up of a wwn and possible "-partn". I looked there and found three devices and a part1 entry. Using `sudo mdadm --examine /dev/mapper/35000cca266237d6c-part1 (for example), I could see the UUID of the raid array as well as the UUID for the drive. With that ID, I could see which ones belonged where. I then did a `sudo mdadm --manage /dev/md127 --add /dev/mapper/35000cca266237d6c-part1` which started the 10TB raid rebuilding parity. After that worked, I repeated the `sudo mdadm --stop /dev/md125` and followed up with the `sudo mdadm --assemble /dev/md125 /dev/mapper/35000c5007bb7cc25 /dev/mapper/35000c5007bb79f4b /dev/sdai1`. That worked. The 3 drive raid5 simply came up with all data intact.

The advice from https://raid.wiki.kernel.org/index.php/Assemble_Run was spot on. The difference in my situation was the multi-pathing and how drives should be referenced to get the multi-path device rather than a single path instance of the drive. That came from /dev/mapper/ and was confirmed by `mdadm --detail`.

Now we have to try to figure out why the system panic occurred in the first place and why the bootup went haywire.

syg00 · 08-17-2017, 05:41 PM

Glad you got it sorted, and thanks for making us all aware of the situation and fix.
Now, about my sigline - and those LTO7's.