Here's a recent example from a box I setup last weekend. This was my first time playing with mdadm (billed as a replacement for Ingo's raidtools-1.x package). We have a few boxes with software RAID, but I'm not in the habit of testing something this important on production units, which were all built and managed by various versions of raidtools. One of the reasons I wanted to use mdadm was it's ablity to monitor the software RAID system and notify me of anything I should know about.
The Box - 'trinity' is an old (1998) Asus P-2BS (intel 440BX) with a P-III , 256MB RAM, and onboard SCSI. Sadly all the SCSI drives were allocated to the Proliant 5000 recently but I did have three 60GB Maxtor ATA/100s (6L060L3) and one 80GB Samsung ATA/100 (SP8004H) w/8MB cache. I rarely leave CD-ROM drives in the server so there's alway one lying about, so a Creative 52x IDE gets a temporary home. ATA controller cards are a pair of Promse Tech Ultra/66 (PDC20262). An old video card and a 3Com 3c905-Tx round out the setup inside a generic 17 inch case, powered by an Enermax 380W PSU.
The Setup - CD-ROM gets plugged into the on-board primary IDE controller (/dev/hda) and the hard disks are attached to the Promise controllers. I chose Vector Linux for this setup as it's claimed to be a leaner Slackware, and we already have lots of slack boxes. Booting and installing Vector Linux is a snap and my basic setup came in at around 360MB. I used /dev/hde (the drive connected to the primary controller on the first card) as the root device, with a /boot (64MB /dev/hde1) and / (the rest of the disk /dev/hde2) partition. No fancy partitioning here as this whole project is a test to see how badly I can mess things up with 'mdadm' and RAID5 as a root device. Most default installs have RAID built-in to the kernel. If you don't - start building.
***
A few notes on add-on controllers are worthy of mention here. Generally off-board IDE/ATA controllers start with /dev/hde, as most x86 boards have a primary and secondary controller that will chew up hda -> hdd. On most installs the kernel will ignore the BIOS boot report and probe the board for hardware. This means that even if you have disabled onboard controllers (IDE0 and 1), the kernel will still pick it up, thus making your offboard controllers start at /dev/hde. There are kernel boot parameters that can be fussed with (ide=reverse) but it's dependant on the kernel setup.
***
As for filesystem type, I'm a reiserfs addict so everything but boot (ext2) will be reiserfs formatted. Now that I have installed and setup Vector Linux on /dev/hde it's time to move on and see what this thing can do.
Some RAID-specific setup After rebooting and confirming thigs are what they say they are one last check revealed that I forgot to set the / partition type to 'fd' - "Linux raid autodetect" so I needed to run fdisk /dev/hde and change the type (option 't' from the menu) of the partition that will be a member of the future RAID array. Write that information to disk and exit fdisk. We have a few more things to take care of while we're at it. Our other participants (/dev/hdg, /dev/hdi, /dev/hdk) need to be partitioned as well and it's also time for another little sidebar.
***
Notice that the drive naming skips /dev/hdf, /dev/hdh, /dev/hdj, as they would be slave drives on the primary and secondary device (on IDE2 and 3). Judging from my reading of mailing lists and other goodies it's a generally accepted rule that you should avoid using any slave drives in a RAID array. You controller can talk nicely to a single drive on a primary and secondary controller with little diffuculty, but adding a slave to the chain can choke the array as each drive must wait while its partner reads/writes. As to whether it's just pop mythology or undeniable fact I can't say fo sure. Any confimations or denials will be graciously accpeted.
***
On with the show. we need to at least create the / partition and optionally the /boot partition on the remaining free devices. This document is merely to demonstrate how to build a simple array. A /boot partition can be used on each disk and built into a very fault-tolerant RAID1 device which can be booted by the kernel easily. This is left as an excersize for the reader. To keep things simple we'll do exactly the same partitioning on the reamining 60GB drives and a minor cheat on the 80GB device. On the 80GB I chose to be lazy and rather than match sizes and cylinders I just made the /boot partition 65MB (/dev/hdk1) and the RAID (/dev/hdk2)partition a couple of MB larger just to make sure things fit easily enough. (mdadm and raidtools will complain about size differences and adjust accordingly and within reason, but I'd rather lose 2MB on one disk than three). I also used another 256MB (/dev/hdk3) as swap space. The remaining 19 or so gigs can be used for emergency purposes (real handy when /var/spool/mail has 0Kb available)
Generally it's supposed to be safe to run something like 'sfdisk -R /dev/hdX' (erm... substitute the letter "X" for the device to be re-read. Don't make me come over there!) to doulble-check that the kernel re-reads the new partition but since I'm in no particular hurry I'll just reboot to be on the safe side.
RAID setup Allrighty then. Things are back up and now we finally get to play with mdadm. What's that? You didn't install it? Then go grab a recent package for your distro or source tarball
here. After you're all geared up we just need to cover a few things before we dive into this mess. First a review of the partitions we'll be using. We have four RAID devices
Code:
[list=1][*]/dev/hde2 (our current / partition, type 'fd') [*]/dev/hdg2 unformatted, type 'fd'[*]/dev/hdi2 ditto[*]/dev/hdk2 ditto[/list=1]
From the man pages we learn that to create the array we need somthing like this
Code:
mdadm --create /dev/md0 --chunk=64 --level=5 --raid-devices=4 /dev/hd[gik]2 missing
WTF is that? Well you should RTFMpages, but in short we're telling mdadm to create a RAID5 array, with a chunk-size of 64K (the default), we have four participants, /dev/hde2, /dev/hdg2, and /dev/hdk2. The missing statement tells mdadm to build a degraded array leaving space for the mising drive. We can do this as RAID5, with shared parity striped on all members, allows for one missing member (hence the redundancy). Unlike the 'failed-disk=' directive used in the raidtools configuration, mdadm doesn't care
which member isn't avialble, it just needs to know that someone's not home and to build accordingly. We want to create the array, but we
don't want to destroy our fresh Vector Linux install on /dev/hde2! With the above statement we're building a
degraded array with three out of four drives, meaning that one disk has already failed. If all went well you should see something like this when you run 'cat /proc/mdstat'
Code:
md0 : active raid5 hdi2[3] hdg2[2] hdk2[1]
117140480 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]
.
Now you have RAID5, so whatcha gonna do with it?
Boot RAID5 on / Now we can format the array with your fs of choice. I like reiserfs so let's try 'mkreiserfs /dev/md0'. That should let you format the new array. A few seconds and it'll be ready for your stuff. A while back I found a nifty snippet from one of the RAID How-To's that I can't seem to locate anymore. If someone spots it in the wild please let me know so I can give credit where it's due.
Mount the degraded RAID device somewhere handy, like /mnt. 'mount /dev/md0 /mnt' for the lazy folks like me who depend on the kernel knowing more about the formats than I do. now 'cd' to the / directory. Punch in 'find . -xdev | cpio -pmv /mnt'. You should read up on the find and cpio options as a) how much do you trust me? and b) you might learn some more neat tricks. While it's copying you'll see a bunch of filenames flying by (or dots if you used -V). Once all is complete (all that text breezing by comes to a halt and you see something like "9977223 blocks") you need to edit the fstab
on the RAID5 device (/dev/md0) which should now be living in /mnt/etc/fstab. You need to tell your system that the / partition is now going to be on /dev/md0 - a line like "/dev/hdk3 / reiserfs defaults 0 1" should suffice, and you should remove your original reference to / as well. Next you'll need to edit /etc/lilo.conf and tell the bootloader to use /dev/md0 as the root device. A simple lilo block would look something like this
Code:
boot=/dev/hde
image = /boot/vmlinuz
root = /dev/md0
label = Linux
read-only
will install lilo to the MBR on the first drive on the first controller (this was how my install defaulted) and tells lilo to use the RAID array as the root device. Now run '/sbin/lilo.conf' and ensure there are no bootloader complaints. Note also that when you reboot the new md0 root device that your lilo.conf will be the one you copied originally in the 'find...' command issued a few steps ago, so it will show your _old_ config (root=/dev/hde2). Change this and re-run lilo.
So now what? Now you should have a working, albeit degraded RAID5 array mounted as /, but what about the failed drive? You need to add this device to the new array and have it resynch. I've dragged you along this far, so perhaps now is a good time to read the man pages for 'mdam'. Relax. Don't panic. It's actually pretty simple. That's why this, too is left as an excersize for the reader.
Cheers,
--DMc