LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   mdadm: how to avoid complete rebuild of RAID 6 array (6/8 active devices) (https://www.linuxquestions.org/questions/linux-server-73/mdadm-how-to-avoid-complete-rebuild-of-raid-6-array-6-8-active-devices-651461/)

pbwtortilla 06-25-2008 01:16 AM

mdadm: how to avoid complete rebuild of RAID 6 array (6/8 active devices)
 
Hi Everyone,

First off, a little background on my setup.

OS: Ubuntu 7.10 i386 Server (2.6.22-14-server)
upgraded to
Ubuntu 8.04 i386 Server (2.6.24-19-server)

I have 8 SATA drives connected and the drives are organized into three md RAID arrays as follows:

/dev/md1: ext3 partition mounted as /boot, composed of 8 members (RAID 1) (sda1/b1/c1/d1/e1/f1/g1/h1)
/dev/md2: ext3 partition mounted as /root, composed of 8 members (RAID 1) (sda2/b2/c2/d2/e2/f2/g2/h2)
/dev/md3: ext3 partition mounted as /mnt/raid-md3, composed of 8 members (RAID 6) (sda3/b3/c3/d3/e3/f3/g3/h3), this is the main data partition holding 2.7TiBs worth of data

All the raid member partitions are set to type "fd" (Linux RAID Autodetect).

Important Note: 6 of the drives are connected to two Sil3114 SATA controller cards whilst 2 of the drives are connected to the on-board SATA controller (I don't know which model it is).

After upgrading my Ubuntu installation to 8.04, upon system restart there was an error message saying that my RAID arrays were degraded and thus the system was unable to boot from it.

At the time, not knowing the cause of the sudden RAID failure, I attempted to force mdadm to start the arrays anyways (the RAID 1 arrays with 8 members each were no causes for concern, of course, but I wanted to back up my data on the degraded md3 array as soon as possible).

Then it hit me, why would it recognize only 6 drives? Apparently the kernel has some compatibility problems with certain SATA controllers and my on-board controller chip was one of them.

Sure enough, after moving all 8 drives to the Silicon Image controllers, the drives were all recognized without any problems.

If the missing drives were recognized again before the array was ever brought up again, everything would've been fine. But unfortunately I forced mdadm (--run switch) to bring it online with 2 missing members.

This is when the problem began. I know that as soon as I re-add the two missing drives back into the md3 (RAID 6) array, the system will attempt to rebuild the array, using the data from the 6 drives.

Given the size of the array and the type of the disk drives being used (off-the-shelf SATA drives with bit error rate of 1 out of 10^14 bits), I think it is highly likely that the system will encounter one or more bit errors during the rebuild.

Anyway, I panicked and brought the md3 array down first to prevent possible further damage.

So, at this stage what I'm wondering is:

1. If mdadm encounters a bit error during a RAID 6 rebuild, will it just give up on that particular file and move on to recover other data on the array? Or will it trash the entire array?

2. Is it possible to cheat mdadm by somehow replacing the new "raid metadata" on the 6 drives with the old data on the 2 drives? Will it make mdadm think the array is clean, consistent and nothing ever happened? Please do note that I did not write ANY new data onto the RAID 6 array from the time it was degraded until the time I brought it down with (--stop).

Sorry for the long post and thank you for your time in advance. I really hope to get this RAID array back up without data corruption because I don't have a working backup of the array (I know, very stupid of me).

Dave

ledow 06-26-2008 03:33 AM

Quote:

Originally Posted by pbwtortilla (Post 3194330)
Important Note: 6 of the drives are connected to two Sil3114 SATA controller cards whilst 2 of the drives are connected to the on-board SATA controller (I don't know which model it is).

That's quite a way to destroy the redundancy of a RAID. RAID6 tolerates failure of two drives, but you've got three drives on at least two of the controllers. So one controller failure = destruction of data.

Fortunately, it was your on-board controller that had the problem, though, and that only had two devices on it to fail.


Quote:

Originally Posted by pbwtortilla (Post 3194330)
At the time, not knowing the cause of the sudden RAID failure, I attempted to force mdadm to start the arrays anyways (the RAID 1 arrays with 8 members each were no causes for concern, of course, but I wanted to back up my data on the degraded md3 array as soon as possible)

I guess now that the stupidity of such a decision (to force past error checks when you "need" to back up the data on the other side of them) has occurred to you.

The first thing anyone should do if they suspect data is at risk is STOP, THINK and possibly turn stuff off until they've read up on everything they need to know to attempt recovery.


Quote:

Originally Posted by pbwtortilla (Post 3194330)
Sure enough, after moving all 8 drives to the Silicon Image controllers, the drives were all recognized without any problems.

I hope this is just a temporary arrangement to recover the data.


Quote:

Originally Posted by pbwtortilla (Post 3194330)
I know that as soon as I re-add the two missing drives back into the md3 (RAID 6) array, the system will attempt to rebuild the array, using the data from the 6 drives.

Correct. Fortunately you chose RAID6 and only had two drives on the failed controller or you could have just lost the entire array before you got to do anything silly anyway.


Quote:

Originally Posted by pbwtortilla (Post 3194330)
Given the size of the array and the type of the disk drives being used (off-the-shelf SATA drives with bit error rate of 1 out of 10^14 bits), I think it is highly likely that the system will encounter one or more bit errors during the rebuild.

Why? Do you buy SATA drives that are notoriously unreliable? A rebuild affects only the degraded drives - the other drives are used mainly for read operations to generate the missing parity data. By this logic, anything you do will be prone to bit errors. Yes, you have 2.9 x 10^12 (roughly) bits of data but it's spread across six drives, thus the failure rate is relatively low. Just *building* the array initially was, by these numbers, quite likely to generate a bit error in it. The drives are built to cope with this, with ECC, spare sectors etc.

Your only alternative now is to image all the RAID6 data to another set of drives/image files as a backup and then perform exactly what you intend to do now - an array rebuild.

I would HIGHLY suggest that you do this. You can even make the RAID rebuild from a file image of a drive partition if necessary (the beauty of the "everything is a file" idea in Unix). This way, you can store an image of those RAID6 partitions on a computer somewhere and see WHAT WOULD HAPPEN if you were to rebuild the RAID with that data/parity before you actually mess about putting those disks/controllers into a machine.

Quote:

Originally Posted by pbwtortilla (Post 3194330)
1. If mdadm encounters a bit error during a RAID 6 rebuild, will it just give up on that particular file and move on to recover other data on the array? Or will it trash the entire array?

It doesn't work on files, it knows nothing about them. It works on the bit-level.

It's quite likely that it will abandon an array rebuild as soon as it encounters a problems. It's also quite likely that the force option (which you are only SUPPOSED to use in such circumstances where you have no choice to get working data back) will let you ignore those errors and continue the rebuild, which could potentially leave you with either a corrupt RAID (if you've removed all the locations of the parity data, etc.), corrupt filesystem (if the error hits in the file system itself), or a corrupt file or two (if the error hits inside a file's data).

Quote:

Originally Posted by pbwtortilla (Post 3194330)
Is it possible to cheat mdadm by somehow replacing the new "raid metadata" on the 6 drives with the old data on the 2 drives? Will it make mdadm think the array is clean, consistent and nothing ever happened?

ARGH. No, no, really don't play any more with the bits on these disks. "Let's just write some unrelated data to the drives and hope the RAID copes..." - this is just the same as accidental corruption and it will do the same thing - error, or be forced to attempt recovery. If you DO fool it into thinking everything is okay, the chances are that everything WOULDN'T be okay, and then you have an inconsistent metadata/disk problem which is a million times worse than the odd corrupt bit.

Read (DO NOT MOUNT WITH THE WRITE OPTION) the data off those RAID6 partitions onto a large harddrive or existing filesystem (e.g. dd if=/dev/sda3 of=/home/user/data/sda3-image ), power down all those drives and see what happens when you try to rebuild the RAID from those file images (or make sure the file images are 100% safe and then try to rebuild the RAID from the drives themselves).

And in future... BACKUP. To a non-disk medium. RAID is USELESS against file/disk/controller corruption. It is USELESS against unreliable hardware. RAID *cannot* compensate for deliberate missing with its metadata. It is USELESS against hardware which degrades past its stated tolerances (e.g. three drives failing in a RAID6 etc.). RAID6 is USELESS against failures of more than a single disk while it's rebuilding.

Personally, I'd go out, buy a couple of the largest hard drives I can find and put images of the RAID6 partitions on BOTH of them. Then I'd stick one drive back in the box and put it somewhere safe, and make the other drive the ONLY drive in a machine. Then I'd attempt a RAID6 recovery on those image files and see what happens. If it all goes well, I'd power on the original machine, wipe out it's RAID6 and build a new empty one, then copy the data (NOT THE REPAIRED FILE IMAGES) from the recovered array (making sure that the other surviving copy of the original disks, that disk I put back in it's box, was kept very safe).

artaphile 08-27-2008 02:50 PM

Quote:

Originally Posted by ledow (Post 3195580)
That's quite a way to destroy the redundancy of a RAID. RAID6 tolerates failure of two drives, but you've got three drives on at least two of the controllers. So one controller failure = destruction of data.

There is no raid system in existence that uses a separate controller per drive, it is not even physically possible to have 8 controllers in a system! Modern controllers can easily support four devices each and to not take advantage of this fact will leave you spending a ridiculous amount of money to have a four drive array with one controller each.

Ridiculous!

-art

edit: ridiculous

touser 12-25-2009 03:17 AM

Sorry to bring up an old thread. I am also planning to build a software raid 6 system with mdadm, starting with 8 drives with the idea of expanding further down the line to 11. I'm planning to use two pci-e controllers with 4 sata ports each http://www.newegg.com/Product/Produc...82E16816103058, and put the rest of the drives on the onboard controller. If a controller decides to kick the bucket will i be able to just toss a new one in and resume normal operation? If not what is the solution to get around this problem? Thanks!

jlinkels 12-25-2009 08:06 PM

The RAID driver assembles an array based on the UUIDs of the partitions. Therefor it doesn't matter if you replace the controllers.

My favorite solution is to buy all the spare parts right now, test them before you take the machine into production, and then store them in a safe place. You know you hardware replacement plan works, and the cost of those controllers is negligable compared to the data.

jlinkels


All times are GMT -5. The time now is 07:00 AM.