Problem configuring RAID 5 under Ubuntu Server 9.10
I have a problem configuring a RAID server under Ubuntu 9.10 (kernel 220.127.116.11) with mdadm (v18.104.22.168). First I had some hardware issues that finally got solved by using another motherboard. Now I am dealing with the software part.
In order to ease things, I am trying to configure a RAID 5 with three partitions in one disk. I have two HD's, one IDE where the OS lies (recognized as sda), and another where I intend to build the RAID (recognized as sdb). In this second drive I have made three partitions (sdb1, sdb2 & sdb3) of the same size. For this I've used
sudo fdisk /dev/sdb
and made three partitions of the same size, then changed the type to "fd". Then format each one with
sudo mkfs.ext4 -m 0 /dev/sdb1
sudo mkfs.ext4 -m 0 /dev/sdb2
sudo mkfs.ext4 -m 0 /dev/sdb3
After that I've created the array with
sudo mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdb2 /dev/sdb3
Finally formatted it with
sudo mkfs.ext4 -m 0 /dev/md0
Everything seemed fine. All messages indicated it was OK and I was able to mount it and put some files there.
The problem came after rebooting, the array was not there anymore. Issuing
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdb(S)
unused devices: <none>
sudo mdadm -A /dev/md0 --run /dev/sdb1 /dev/sdb2 /dev/sdb3
and it was claiming
[1992.78]md: could not bd_claim sdb1.
mdadm: failed to add /dev/sdb1 to /dev/md0: Device or resource busy
[1992.78]raid level 5 set md0 active with 2 out of 3 devices, algorithm 2
mdadm: /dev/md0 has been started with 2 drives (out of 3)
After stopping and rebooting, the array seemed to work, and it would mount properly. But stopping and restarting it gave all kind of weird messages, as follows.
[ 384.280302] md: could not bd_claim sdb2.
[ 384.280370] md: md_import_device returned -16
[ 384.288420] md: bind<sdb3>
[ 384.288583] md: could not bd_claim sdb1.
[ 384.288648] md: md_import_device returned -16
[ 384.303607] raid5: device sdb3 operational as raid disk 2
[ 384.304906] raid5: allocated 3178kB for md0
[ 384.305063] raid5: not enough operational devices for md0 (2/3 failed)
[ 384.305177] RAID5 conf printout:
[ 384.305185] --- rd:3 wd:1
[ 384.305194] disk 2, o:1, dev:sdb3
[ 384.305876] raid5: failed to run raid set md0
[ 384.305935] md: pers->run() failed ...
[ 384.325074] md: bind<sdb1>
The biggest problem is repeatability, because I get different errors with the same commands. Sometimes if I keep stopping and restarting the array, it will start ok with the three disks, and sometimes it will claim that one of these drives is being used. Going through the logs, I've found that sometimes it is being used by "/dev/md_d0" (ls /sys/block/sdb/sdb1/holders), which I don't know what is, how it's there and how to prevent it to be there.
Actually I intend to do a RAID 5 with 5 1.5TB disks, but I don't want to make tests on the whole setup since it's very time consuming (about 36 hours to build the array) and it seems that there is a software issue that I cannot get hold of.
Any help would be appreciated. I've already re-installed Ubuntu 9.10 a couple of times, zeroed the superblocks of the partitions, repartitioned the disks with different partition sizes (I am using 5 GB partitions to save time). I've gone through this process several times, and I really don't know how to move forward now. If RAID is about trust and reliability, this is exactly what I'm not able to get.
What you are trying to do, creating a RAID array on 3 partitions on the same device is highly unusable. I don't think it is forbidden by mdadm, but I am not surprised either it chokes. Why do you want to create a RAID5 array on the same device? I hope not bit errors, those are corrected internally in your hard disk. And if one partition fails, I wouldn't be surprised if you can't read from the other partitions either because the disk is stuck trying to read from that partition. And then again, 9 out of 10 disks I see fail on their controller, not on bit errors.
Anyway, you should not format the partitions before you create the RAID arrays.
Try to clean the superblocks before you create the arrays. It is a mdadm command but I forgot.
Re-installing doesn't solve a thing. It is not Windows. If you doubt your installation, install Debian Stable. Since you are using the CLI anyway this shouldn't make a difference and Debian Stable is stable.
I am trying the RAID on the same device just for the sake of getting used to the configuration and management. Once I realize it's stable and works seamlessly, I'll make the array with 5 disks.
If you want to know if something is running stable, you should use a common setup, not something that is exceptional to the point where it is questionable whether it conforms to specification.
Besides, RAID on Linux servers has been proven to the extreme so there is no need to wait with your eventual installation to see if it stable.
Ubuntu Desktop has not the best record for stability. I have been told this is different with Ubuntu server, but nevertheless you should check relevant forums to see if Unbuntu server is stable on RAID. If there are doubts choose another distro.
I can install the whole setup (5x1.5TB), but then creating the array takes 36 hours, which I think it is just not practical at this stage.
But seriously, I have been thoroughly messing around with RAID arrays. Bare metal restore, totally zeroing the disk and try to rebuild the array so I could put back the data, changing UUID's, removing disks from the array and re-adding them, screw up the formatting, changed partition tables, trying to break it, making disks defective, anything except physical violence and everything you never want to do on you live server.
Not once mdadm responded in an undefined way. Where it messed up, it was fully my misunderstanding, and if it was possible by mdadm specification I could restore/recreate the array. Of course there were some cases where I destructed the array beyond repair.
Three more points: you cannot boot from RAID5, but you were not doing that, were you? You can boot from RAID1 but don't forget to install GRUB on both physical disks.
If you want to install Debian use the Stable version.
You don't have to fully resync the array before testing. Even when you reboot after 2% resyncing, you should see no errors, the array should re-assemble and the resync will continue at the point where you were at rebooting.
|All times are GMT -5. The time now is 01:30 PM.|