RAID5 Array Recovery after OS upgrade
I need to know if I can fix this or if I should bite the bullet and start reloading my dvd backups. Again. I have 90-95% backed up, but it's about 1.6 TB of data on hundreds of DVDs, so you know how painful reloading is.
I have 6 400 GB SATA drives on a RAID5 array mounted to /home. System is Ubuntu 6.06 Server kernel 2.6.15 with mdadm (don't know the version). I recently installed a new motherboard with more on-board SATA connectors as I was also planning to start adding more drives. The plan was to add a 3 bay enclosure that can hold 5 drives and setup 3 500 GB drives in another RAID5 array now and expand to 5 later. There were two issues with this, both of which I realize now could probably have been resolved if I had simply taken the time to learn how to compile a new kernel. The network drivers weren't loaded when it booted (two on the board), so I had to use a card, and I remember reading that I would need a newer kernel than was available in the apt repository for ubuntu 6.06 to expand an array. So, instead of compiling a new kernel, I decided to do a fresh install of 6.10 server. During install, there was a some problem with DHCP, and it took me back to the menu. I got it sorted out but didn't realize until it finished that it managed to skip several sections of the installation, including user setup. With no login, I ran the recovery. When it finished, it looked ok. mdadm showed the device as /dev/md0(what it had been), and it mounted fine. If everything else had been fine, it would have been what I wanted, however, the install had missed other stuff besides user accounts, for example, not only were there no apt sources configured, there were no man pages installed. Another reinstall, all the way through this time. mdadm didn't set it up correctly this time. Here is the current status. Quote:
Quote:
The only other thing is that last night, it showed it as degraded and resyncing one drive, and it finished the resync. What should my next step be? |
Quote:
Quote:
|
Quote:
Quote:
|
Quote:
http://www.linuxquestions.org/questi...d.php?t=544557 |
I looked at it, but it doesn't solve this problem since I want to keep my data. Does help a little with learning about mdadm, though.
|
Did you build a new mdadm.conf as a result of this, or did you keep the one that had been on it? You didn't run mdadm -create did you? Where did the current mdadm.conf come from (auto gen or did you make it?) and what are its contents?
|
just a guess...
I've read a german thread about the very same error. someone had built a raid array with /dev/sda1, /dev/sdb1 and so on. after a kernel update his raid seemed ok, but mdadm showed /dev/sda, /dev/sdb and so on as members. fdisk -l would still show /dev/sda1, /dev/sdb1 were there. there was also the discrepancy between device/array size. his solution was to zero the superblocks on the false mebers /dev/sda, /dev/sdb .... + reboot Code:
mdadm --zero-superblock /dev/sd[a-e] in case you want to give it a shot, I can translate it in detail |
I don't think I ran mdadm --create. The mdadm.conf was auto-generated.
Quote:
Quote:
|
Quote:
Code:
mdadm --stop /dev/md0 |
Um, so it may be worse now. I stopped the array and assembled it, but it came up the same as before with /dev/md0p1 in fdisk. Stopped it again and tried the zero-superblock because last time I had the command out of order and didn't stop it. It worked. Tried to assemble, and it said,
Quote:
|
Code:
mdadm --assemble /dev/md0 /dev/sd[e-i] |
This thread mentions mdadm --assemble --force. Would it be a bad idea to try it?
http://www.issociate.de/board/post/2...lyaborted.html |
First, I'd like to know exactly what you typed.
|
this one will be interesting as well:
http://kev.coolcavemen.com/2007/03/h...d-superblocks/ basically what is discussed here is recovery of a raid5 after zero-ing all the superblocks of the included partitions, seems to work. I've tested it with vmware and a raid1, killed all superblocks, mdadm would not assemble/start the array. then tried above mentioned --create and lo and behold, I could mount it, no data was lost. mdadm complained about a preexisting filesystem, but I forced it to do its magic and it worked. |
Quote:
Quote:
Quote:
|
Quote:
Code:
at the time I created the raid I must have made a mistake, which showed up right now. |
Well, I decided to give mdadm --create a shot.
Code:
mdadm --create /dev/md0 --verbose --level=5 --raid-devices=6 /dev/sd[d-i]1 |
Alright, so it finished resyncing. Now we get
Code:
# mount /dev/md0 md0 |
You've kind of lost me here. I was under the impression that the create option created a new array and threw away anything that previously existed. As far as I was understand, the data was gone when you ran create.
|
I was going by this article posted earlier. http://kev.coolcavemen.com/2007/03/h...d-superblocks/
|
there is a utility called testdisk http://www.cgsecurity.org/wiki/TestDisk which can scan devices with ext2/ext3 for backup superblocks and help recover them.
some hints: http://www.cgsecurity.org/wiki/Advan...kup_SuperBlock you could run: testdisk /dev/md0 then: [PROCEED], [NONE], [Advanced], [Superblock] if this works you should get some output like this: superblock 0, blocksize=1024 superblock 8193, blocksize=1024 ... ... with that you can tell fsck.ext3 (or equivalent on your system) to use a backup superblock like e.g.: /sbin/fsck.ext3 -b 8193 -B 1024 /dev/md0 if that doesn't work I'm at my wits' end. |
Well, testdisk didn't show any partitions under advanced, so I'm running analyse. It's going to take a good while, but I'm going to start making plans to start reloading data. I'll post an update when it finishes.
|
Alright, well, it analyse didn't detect stuff correctly and just gave a bunch of garbage, so I'm pretty positive it's gone. So many DVDs to reload! Oh, well. Thanks for your help.
One last thing, what precautions should I take in the future to increase my chances of recovery? I know now to run dist-upgrade install of installing from disk, but other than that and backing up my mdadm.conf, what should I do? |
I've been thinking about this today, and I wonder if the problem could have been avoided if you hadn't had your drives connected when you installed mdadm. I mentioned that I installed mdadm once and it created a bunch of junk on the drives I had connected. It may or may not be a problem, but it's something to think about if you have to reinstall for any reason. You might also think about creating a backup of your non-data files. There are a number of good backup systems out there. I just used tar along with a trivial script I wrote. I actually did a restore (from Knoppix) of the boot/non-data image I keep on my data disks recently, and it worked just fine.
|
Well, I've almost got everything working, but I've got a few snags. Two parts.
First, I want the 6 400GB drives to start as md0 and the 3 500GB drives to start as md1. When I reboot, md0 starts with 2 of the 3 500GB drives and resyncs with the third while md1 starts with 4 of the 6 400GB drives. mdadm.conf is currently Code:
# cat mdadm.conf Code:
pvcreate /dev/md0 Code:
vgchange -a y RAID_GROUP |
Well, now on reboot, md0 comes up correctly but as the 3 500GB drives, so I'll just leave that the way it is. md1, however, only comes up with 4 drives on startup. Once I get a console,
Code:
mdadm -A /dev/md1 /dev/sd[d-i] |
All times are GMT -5. The time now is 06:29 PM. |