junkjunk 07-07-2009 05:28 AM

Restore system with RAID1, one failed disk

I had a disk crash recently; when I was to play a media file the whole system hanged. I switched to terminal and rebooted, and then it wanted to run fsck. I let it, but it gave me some error and now I'm getting "grub 17 error" at boot. Can't access the system at all.

I'm running software RAID1 (mdadm on Ubuntu 9.04) on the root file system, but even though I remove the broken drive I still get the grub 17 error. I've tried fixing it using the Super Grub Disc, but without luck.

Seems a bit pointless to use RAID1 if both drives gets corrupted. Or did I realise to late it was a hardware problem? Maybe I shouldn't have ran that fsck? It gave me some warning about running with mounted filesystem, but I figured it should be okay since it actually tries to run at startup. What else should I have done?

Anyone know what to do next?

Thanks in advance!

leniviy 07-07-2009 09:39 AM

junkjunk, first thing you should do is booting a live cd and backing up all important files to usb or network. Only then try to fix things.

Knoppix 5, for example, can detect software raids on boot
If it's really RAID1 and you won't be able to re-sync it, even then (I guess) you can mount just one of 2 disks.

junkjunk 07-07-2009 03:05 PM

Thanks for really good information! Luckily I had a recent offsite backup which I could use to rebuild the array from scratch, will most certainly not be that lucky next time.

I still have the degraded array, might try your suggestions anyway. How would I mount just one of the disks? Using Knoppix?

leniviy 07-08-2009 03:08 AM

junkjunk, so do you have something important on that disk or not?

on the scale of luck:
- boot knoppix and see the icons of your partitions on the desktop, access them

- boot knoppix and not see the icons, then try to find the partitions manually.
fdisk -l -u /dev/sda
. If it errors, it's good: the partition table broken. If not, then the fs root node broken which is worse.

If only the partition table got corrupted, then you can try to remember the sizes of the partitions. I assume you had /dev/sda1 for data:
mount -r -o offset=`expr 63 \* 512` /dev/sda /mnt
(the 1st partitions usually starts at sector 63)

- the root node of the partition corrupted. If so, only special tools (usually paid)

junkjunk 07-08-2009 08:54 AM

Well, I had the data backed up offsite recently, so I can restore from that.

I still haven't touched the healthy disk, so I'll try using Knoppix to see if I can get it back up, it would make me feel safe about my RAID1 array again =)

Thank you for the information, I'll get back to you later!

junkjunk 07-08-2009 10:14 AM

Oooh yeah! Booted using Knoppix, ran the fdisk command and it printed the different partitions on sda. I tried to mount them all, one succeeded. When I looked inside the successful mount I could see and retrieve the files. So it worked! I feel safe about the RAID array again =)

How does mdadm work? It creates partitions as usual on each device, and then acts as a middle layer between those partitions and the operating system? Because here I could mount this drive as if it never had been in a RAID array, which is very neat!

Thanks for the help!

