Repeated RAID 5 Remount Problems After Power Failure

HalNineThousand · 12-05-2005, 12:18 PM

I have a system running a RAID 5 device built from mdadm. While I have a UPS, it'll be a few weeks until this particular system is on it. Within the past week I've had 3 power flickers. Each time this system shut down, when it came back up, the RAID was not working. I've found mdadm tutorials, but they focus on SETTING UP a RAID, not on FIXING it. Each time I've fixed the RAID with fsck (it was always the same error:

mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

There is, though, no trouble fixing it with fsck, however, fsck has to scan the entire device, which takes time.

So I've got a few questions, separated for easy response:

1) This is the only drive I've seen that ALWAYS needs to be fsck'ed and fixed every time a system is shutdown without going through the shutdown routine. Even another RAID 5 on a different system hasn't had that problem. This is not the boot drive, it's a data drive (exported to the LAN via NFS and SAMBA, but so is the other one that doesn't have trouble). Is there a likely reason for that?
2) Shouldn't there be some way to use mdadm --assemble to rebuild the damaged date easily?
3) When I run fsck, I've done it as "fsck -C -a /dev/md0", but it says I have to do it manually. This is a big device (total of almost 300 GB). Isn't there some way to automatically answer 'Yes' to all the prompts?
4) I'm using Debian (Sarge) (don't know if that matters). What can I do to make sure this device is checked AND fixed on startup?

Thanks for any help!
----------
mdadm --detail /dev/md0:
/dev/md0:
Version : 00.90.01
Creation Time : Thu Nov 10 15:50:53 2005
Raid Level : raid5
Array Size : 312581632 (298.10 GiB 320.08 GB)
Device Size : 156290816 (149.05 GiB 160.04 GB)
Raid Devices : 3
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Mon Dec 5 12:43:03 2005
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 64K

UUID : 31df4026:02b1cf99:8fe140e4:946524b1
Events : 0.410343

Number Major Minor RaidDevice State
0 33 0 0 active sync /dev/hde
1 33 64 1 active sync /dev/hdf
2 34 0 2 active sync /dev/hdg

3 34 64 - spare /dev/hdh

HalNineThousand · 12-13-2005, 11:44 AM

Any insight would have been appreciated. It took 4 days, working all day and into the night, but the problem is solved.

I was using as ASUS KT7A-RAID motherboard which has 2 extra IDE controllers. I had tested the controllers out with extra disks, and it worked fine. They formatted, showed up properly, and everything else. So I had no reason to suspect the motherboard at all, since it tested okay with the extra drives on it.

After a number of power outages during the aftermath of snowstorm, this kept happening. Finally I offloaded the data (so much it took over 4 hours w/ rsync to move it to another computer!), and started experimenting with drive formattng and RAID setup. Finally I changed from using parted to fdisk, and re-created some drive formats. To get rid of any mdadm settings, I rebooted and the motherboard did not like the new drive formats. Several more reboots to experiment and it is confirmed: the motherboard RAID controller could not accept ext3 formatting without complaining on boot. Since this is a server that should not need handholding during boot, a startup sequence asking me to reconfigure or accept current settings is not acceptable. I finally just swapped motherboards between this and another system and bought a PCI IDE controller.

I'm posting this because I was lead here, even to this topic, through Google and another thread on mdadm. I want to include enough info so anyone else with the same issue will see this post in searches. My experiece is that the KT7A-RAID board works with Linux, but the extra 2 IDE channels, which are supposed to work as RAID 0/1 or JBOD do not work as JBOD with Linux formats. I don't know if, when set up as a RAID controlled by the motherboard, it'll work with Linux.

ryoojin83 · 01-03-2006, 01:14 PM

Hey I am having similar problems, but when my RAID 5 won't start I just used 'mdadm -As /dev/md3' and it loaded it right up... I think it has to resync again though, but you can still use it just fine. then I had to manually mount... my problem is that my raid NEVER starts after a reboot...no matter what I do, you think you can help me? lol

HalNineThousand · 01-03-2006, 01:18 PM

My problem seemed to be board specific. Are you using the same board?

If not, digging into the problems means looking at a log of variables. What distro are you using? How did you install mdadm? Is it starting as a daemon on boot?

That's good info for starters.