-   Linux - Server (
-   -   Re-assemble RAID 5 array (

scottastanley 08-19-2012 09:54 PM

Re-assemble RAID 5 array
I have a raid 5 array that is in a very confused state and I am trying to figure out how to re-assemble it with minimal data loss. I'll explain how it came to be in this state since it is likely to influence how to recover... (sorry for the novella on this, but the history seems relevant).

I actually have three arrays,
/dev/md0 (boot disk) : raid 1 [/dev/sda1 and /dev/sdb1]
/dev/md1 (largely unused) : raid 1 [/dev/sda2 and /dev/sdb2]
/dev/md5 (data array) : raid 5 [/dev/sdc1, /dev/sdd1, /dev/sde1 and /dev/sdf1]
In /dev/md5, sdc1, sdd1 and sde1 were active and sdf1 was a spare. I ignore md1 for the rest of this since I do not really care about it, it is md5 I am suffering with.

Yesterday, the disk for /dev/sda had an issue with a loose power cable. When I rebooted the system, all of the disks shifted their device letters, so /dev/sdb became /dev/sda, /dev/sdc became /dev/sdb, etc. So, /dev/md0 worked in a degraded state with one disk and /dev/md5 lost sdc1 and started recovering using sdf1 (now sde1 because of the shift in letters).

I shut down the server, fixed the loose power cable on sda and turned the machine back on. So, all of the disks shifted back to their original device letters. At this point, the arrays md0 came back missing /dev/sda1 and md5 came back missing /dev/sdc1/. The array md5 was rebuilding using the spare sdf1. I re-added sda and sdc to the arrays,

mdadm --manage /dev/md0 --re-add /dev/sda1
mdadm --manage /dev/md5 --re-add /dev/sdc1

Everything looked fine, md0 fully recovered and md5 was working on recovery. However, after md5 got 70% restored something happened and the array got in a bad state. Not sure what happened, there was a complaint in the logs about one of the disks, but now they all show as fine...

In any case, the raid state is now,

mdadm --detail /dev/md5
Version : 0.90
Creation Time : Mon Oct 27 23:09:08 2008
Raid Level : raid5
Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 3
Total Devices : 4
Preferred Minor : 5
Persistence : Superblock is persistent

Update Time : Sun Aug 19 16:29:46 2012
State : clean, degraded
Active Devices : 1
Working Devices : 3
Failed Devices : 1
Spare Devices : 2

Layout : left-symmetric
Chunk Size : 128K

UUID : e669ab57:20f34bf3:9d4e21ac:ed63aa71
Events : 0.4584

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 49 1 active sync /dev/sdd1
2 0 0 2 removed

3 8 33 - spare /dev/sdc1
4 8 81 - spare /dev/sdf1
5 8 65 - faulty spare /dev/sde1
So, for some reason it has sdc1 and sdf1 as spares and sde1 as a faulty spare.

As I interpret things between the four drives all of my data is probably still there. However I am not clear on how to get the array re-assembled and get it to recognize that the data is there.

I appreciate any suggestions with this I can get. The current state of /proc/mdstat is,
[root@moneypit ~]# more /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md5 : active raid5 sdc1[3](S) sdd1[1] sdf1[4](S) sde1[5](F)
1953519872 blocks level 5, 128k chunk, algorithm 2 [3/1] [_U_]

md1 : active raid1 sdb2[0] sda2[1]
104631232 blocks [2/2] [UU]

md0 : active raid1 sdb1[0] sda1[1]
12586816 blocks [2/2] [UU]

unused devices: <none>
and my mdadm configuration is,
[root@moneypit Config_Notes]# cat /etc/mdadm.conf

# mdadm.conf written out by anaconda
DEVICE partitions

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=79529c65:21e09d52:e6fa623d:3fb5858a
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=f517b809:f5018cfa:f627cb2a:77ac5f3d
ARRAY /dev/md5 level=raid5 num-devices=3 spares=1 UUID=e669ab57:20f34bf3:9d4e21ac:ed63aa71

chrism01 08-21-2012 07:49 PM

Well, let's start by summarising:

1. according to madam & the partitions list, you have

active sync /dev/sdd1
spare /dev/sdc1
spare /dev/sdf1
faulty spare /dev/sde1

& the orig array was sdc1, sdd1, sde1 & spare sdf1.
Note the (effectively) swap of sde1/sdf1.

You could try(?) doing a force assembly with sdc1, sdd1, sdf1, as it looks like that may be the nearest thing to a RAID5 set you have, assuming sdf1 is more recent than sde1 (from your notes).
I hope you have a backup; it looks risky to me.
Good luck

Actually, c/d/e might be a better bet if 'e' was kicked out of the array early on; it may be less corrupt than 'f'.

scottastanley 08-21-2012 10:46 PM

Thanks for the feedback. I think I would probably first try assembling using sdc, sdd and sde. These three comprised a fully functioning array to start with. It only got broken because the device names shifted down due to sda disappearing. sdf came in to play because it started trying to rebuild after the loss of sdc.

Unfortunately, I do not have a backup of the original array. I did buy four new drives and I am going to use dd to do a drive level clone of all four before I try and do anything. That way, if it goes completely wrong I can go back and start over with the current state. At least this way I can have multiple attempts at getting it all back together.

The force assemble command would look like this as,

mdadm --assemble /dev/md5 --force /dev/sdc1 /dev/sdd1 /dev/sde1

Does this look about right? Is there any reason to use the --uuid option? As I see it, since I know the relevant drives this is not going to do much.

Is there any way to try and force it to use all four drives to rebuild the original array? I wonder if there could be data on sdf that is not on the others?

chrism01 08-22-2012 05:04 AM

Given it was only ever defined as 3 active at any time, I'd stick with trying c/d/e, then c/d/f if that doesn't work.
I wouldn't bother with UUIDs unless yopu are worried the disk will shift again before you do it.
Good idea to do dd backups; like you say, it'll give you multiple goes at it.
By the time you've finished, you'll be a RAID/mdadm guru :)

scottastanley 08-23-2012 12:00 AM

Thanks. This is pretty much what I was thinking in terms of attempted recovery order. Drives should not shift again, I believe I found the issue and it has not been a problem again all week... Doing the copy of the drives now and will hopefully try the reassemble later in the week.

Now if I can only figure out how to prevent this kind of thing from happening again. I was pretty disgusted when I discovered the loss of one drive from a mirrored array totally hosed the rest of the system. Seems like there should be a better way of assembling these things than relying on device letters. Or, maybe there is a way to fix the device letters to the drives so they are static.

This whole episode is going to drive me to have to come up with a good backup mechanism.

chrism01 08-23-2012 08:02 PM

1. drive letters are fine generally, but for absolute addressing, that's why UUIDs were invented (iirc, its a scsi thing; they could move after a reboot, even if there's no hw failure)
It's rare I think but ...

2. RAID5 should handle 1(!) failed disk ok; good to have a hot standby so it can start recovery immediately.

3. if you've got many disks in an array, consider RAID6: handles 2 disk failures...

4. Definitely time to setup a backup system :)

scottastanley 08-24-2012 08:42 PM

Well, it appears that the superblock on one of the disks is gone. When I try,
mdadm --assemble /dev/md5 --force /dev/sdc1 /dev/sdd1 /dev/sde1
I am getting the complaint,
mdadm: no RAID superblock on /dev/sde1
mdadm: /dev/sde1 has no superblock - assembly aborted
I am thinking there might be useful information here,, but trying to cull through it. Not sure what he means by "I recreated the array with the "--assume-clean" option.". Maybe he is just using --assemble.

scottastanley 08-24-2012 09:06 PM

Curiously enough, when I look at the superblock for all of the other three drives, sdc, sdd and sdf, the state is listed as "clean" but the drives listed are a total mess (as when I originally posted).

For sdc, for example, there is;
mdadm -E /dev/sdc1
Magic : a92b4efc
Version : 0.90.00
UUID : e669ab57:20f34bf3:9d4e21ac:ed63aa71
Creation Time : Mon Oct 27 23:09:08 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
Raid Devices : 3
Total Devices : 4
Preferred Minor : 5

Update Time : Mon Aug 20 01:41:32 2012
State : clean
Active Devices : 1
Working Devices : 3
Failed Devices : 1
Spare Devices : 2
Checksum : ead0959 - correct
Events : 4586

Layout : left-symmetric
Chunk Size : 128K

Number Major Minor RaidDevice State
this 3 8 33 3 spare /dev/sdc1

0 0 0 0 0 removed
1 1 8 49 1 active sync /dev/sdd1
2 2 0 0 2 faulty removed
3 3 8 33 3 spare /dev/sdc1
4 4 8 81 4 spare /dev/sdf1
I guess I will attempt to assemble the sdc, sdd and sdf drives and see what happens...

scottastanley 08-27-2012 12:19 AM

So, I have managed to get the array back up and running using disks C/D/E. However, I have lost a chunk of data. The loss of data became evident when I ran fsck on the file system once the array was back up and running. I am sure a bunch of the files I am losing are ending up in lost+found, due to corrupted directory structures. I have found a few video files laying around there, where their parent directory was lost so the file got orphaned.

I do not consider this process even remotely complete, since I have not touched disk F yet. When the problem first occurred, the array had lost C and and was in the process of rebuilding using D/E and the spare F. From what I could tell from the logs, the rebuild got about 70% complete before the failure occurred. So, I figure there is a possibility of recovering additional data by trying to rebuild using D/E/F.

This has almost become my just logging what I am doing and I apologize for that... I am hoping that maybe this will be helpful for someone else in the future.

The steps I have taken to get the array back and running using disks C/D/E are as follows (in case anyone else finds it useful);

As mentioned above, I bought four new 1Tb internal disks so I could clone the disks from the failed array. This allows me multiple attempts at the recovery (disks C/D/E on this pass and later using disks D/E/F). When I added the four new disks, they were devices G/H/I/J.
  1. Clone the disks (each of these took about 13 hours, but I could do them in parallel)
    dd if=/dev/sdc of=/dev/sdg
    dd if=/dev/sdd of=/dev/sdh
    dd if=/dev/sde of=/dev/sdi
    dd if=/dev/sdf of=/dev/sdj
  2. Recorded the superblock from the disks that had them, that way I had the details on the settings of the array (chunk size, etc). As mentioned above, E did not have a superblock anymore.
    mdadm -E /dev/sdc1 > sdc1_superblock.txt
    mdadm -E /dev/sdd1 > sdd1_superblock.txt
    mdadm -E /dev/sdf1 > sdf1_superblock.txt
  3. Cleared the superblock from all disks so that mdadm did not try and restore the array to the previous state (with drives in messed up states).
    mdadm --zero-superblock --force /dev/sdc1
    mdadm --zero-superblock --force /dev/sdd1
    mdadm --zero-superblock --force /dev/sde1
    mdadm --zero-superblock --force /dev/sdf1
  4. Assembled the array, forcing mdadm to assume it was clean (this rebuilt the superblock on all disks). Note the settings for this command, chunksize, etc, came from the saved superblocks above.
    mdadm --create /dev/md5 --chunk=128 --level=raid5 --raid-devices=3 --assume-clean /dev/sdc1 /dev/sdd1 /dev/sde1
  5. Updated /etc/mdadm.conf to have the correct details for the newly restored array.
    mdadm -E /dev/sdc1 (used this to get the new UUID)
    vi /etc/mdadm.conf (set the entry as follows. note, only C/D/E and no spare with new UUID value)

    ARRAY /dev/md5 level=raid5 num-devices=3 UUID=fee14c9c:87f54a7d:0a24a15b:12beafe8
  6. Stopped the array and restarted it, triggering a resync. This took over night to complete the resync, but the array came up successfully with drives C/D/E active.
    mdadm --manage --stop /dev/md5
    mdadm --assemble /dev/md5 --update=resync
  7. Did a file system check on the ext3 file system on the array. Should have used the -a option as well to prevent having to hit "y" what seemed like a million times...
    fsck -V /dev/md5

After running the fsck, I had a functioning file system on the array. I did lose a bunch of files and directories, but I am pretty sure the majority of the data is in lost+found. Out of 800Gb of data on the array, I have 200Gb of files in lost+found. Unfortunately they are a bit scrambled, since the directory structure is lost. If I had the fortitude to track through the 30 thousand or so files that are in this directory I am sure many of them can be restored to the right locations. In my case, I am pretty certain most of the files in lost+found (at least most of the volume of data) are video and audio files from our iTunes library. Luckily, I happened to have copied the library to an external usb drive to use on a separate computer a month or two ago. So I can recover them that way. Unfortunately, on this array I had the directory tree for my Subversion repository... Many many small source files... This directory tree seems to be totally lost. Even the top level directory is no longer present on the drive.

I went today and bought a 1Tb USB drive and I am copying the results of rebuilding the array using disks C/D/E to the 1TB external drive (also bought a 2Tb USB drive to use as a backup destination in the future). Once that is saved away, I will re-clone the original disks C/D/E/F and rebuild the array using D/E/F and see if any more files can be recovered. Once I do this, then I will do diffs between the two sets of recovered files and see if any of what I have lost is recoverable using disks D/E/F. Hoping I get lucky and find that subversion tree...

scottastanley 01-07-2013 11:44 PM

I just realize I have never posted back on the final status of my recovery. In the end, I was able to fully recover the array with no noticeable loss of data. I pretty much followed the procedure I outlined in the previous post. However, I went back and looked at the logs from when I copied the drives using dd and realized that there was an error when copying one of the disks. There was apparently a problem with a region of the disk. When dd hit the bad region of the disk, it aborted the copy. So I was trying to recover using a partial disk copy.

When I realized this, I started the whole procedure over again and used the following command to copy the disks,
dd if=/dev/sdc of=/dev/sdg bs=1024 conv=sync,noerror
With these options in the copy, dd continued the copy padding the bad blocks with NULLs. So, I had a complete copy (ignoring the bad blocks) to try the recovery from. When I rebuilt the array using disks copied in this way, it rebuilt successfully and I had no apparent loss of data.

All times are GMT -5. The time now is 07:07 PM.