Hardware RAID

jbstew32 · 11-24-2003, 11:36 PM

What is the best RAID PCI Card for speed/compatability(ie built in kernel support) that supports RAID 0,1,01, 5 for up to 4 HDs.

This is for a small business, so it doesn't have to necessarily cheap (preferably about $250-300 or less)

mcleodnine · 11-25-2003, 02:24 AM

3Ware has the best track record for Linux compatibility - support for their ATA cards is in recent kernels. I've had a few bad experiences with the Promise cards (binary-only support) so I avoid them now.

jschiwal · 11-25-2003, 03:10 AM

You may want to look at SCSI raid. I don't think that ATA RAID works the best for RAID 5.

jbstew32 · 11-25-2003, 11:35 AM

well raid 5 isnt that important, I just want the support for a later time (and even thats a maybe).

Any specific 3Ware cards that people like? Are the latest cards supported in the 2.4.x kernels?

Vlad_M · 11-25-2003, 05:07 PM

If this is for anything *nearly* mission critical (and it sounds like it is, if you are looking for RAID) then definitely go with the SCSI disks (and controllers). I had a vey positive experience with Adaptec cards.

If you want to save money, consider buying the SCSI disks/controller card, and using the Linux software RAID (called mdadm at the moment, used to be mdtools). I did some tests with this, and the results I got compared most favourably with the hardware RAID, esp if your machine is not running any other CPU intensive processes.

mcleodnine · 11-25-2003, 05:22 PM

If you want to be able to resize the array you should look at a dedicated RAID controller (ATA, SATA, or SCSI) that will support those functions.

We use a lot of RAID here and it's all software. Only using mdadm on one at the moment, raidtools-1.x on the rest. So far I really like mdadm - especially the monitor/follow feature.

Vlad_M · 11-25-2003, 05:40 PM

mcleodnine, is it not possible to resize arrays with mdadm?

mcleodnine · 11-25-2003, 05:48 PM

Yeah, but from what I've read there's little or no exception handling and it's slow. I have yet to try it and would only do so if I had a bulletproof backup and the services would allow for any downtime.

jbstew32 · 11-27-2003, 12:46 AM

Are there good docs on setting up software RAID? Do you install linux on one drive, then setup the software RAID and it does everything for you?

I have zero experience with this obviously, but have used linux for several years. How hard is it/time does it take after the hardware is set up?

This would be a mission critical system.

Vlad_M · 11-27-2003, 03:59 AM

Well, let me try and put it this way. I got into linux precisely because I wanted to use software RAID - I think mcleodnine will remember giving me some pointers (it was a part of my postgrad dissertation). Software RAID id easy enough to understand and use that I as a total newb managed to do it. There is a very good HOWTO for the old raidtools at the TLDP, and there is a mailing list for the new mdadm tools on which the author of mdadm, Neil Brown, answers most of the questions people have. So there are ample resources either way.

If it is a mission critical system, really look at SCSI drives. In my (humble) opinion, mdadm is robust enough to be deployed on a mission critical system. It goes without saying that nightly (tape or DVD) backups are a must for ANY kind of MC system, however.

mcleodnine · 11-27-2003, 05:10 AM

Here's a recent example from a box I setup last weekend. This was my first time playing with mdadm (billed as a replacement for Ingo's raidtools-1.x package). We have a few boxes with software RAID, but I'm not in the habit of testing something this important on production units, which were all built and managed by various versions of raidtools. One of the reasons I wanted to use mdadm was it's ablity to monitor the software RAID system and notify me of anything I should know about.

The Box - 'trinity' is an old (1998) Asus P-2BS (intel 440BX) with a P-III , 256MB RAM, and onboard SCSI. Sadly all the SCSI drives were allocated to the Proliant 5000 recently but I did have three 60GB Maxtor ATA/100s (6L060L3) and one 80GB Samsung ATA/100 (SP8004H) w/8MB cache. I rarely leave CD-ROM drives in the server so there's alway one lying about, so a Creative 52x IDE gets a temporary home. ATA controller cards are a pair of Promse Tech Ultra/66 (PDC20262). An old video card and a 3Com 3c905-Tx round out the setup inside a generic 17 inch case, powered by an Enermax 380W PSU.

The Setup - CD-ROM gets plugged into the on-board primary IDE controller (/dev/hda) and the hard disks are attached to the Promise controllers. I chose Vector Linux for this setup as it's claimed to be a leaner Slackware, and we already have lots of slack boxes. Booting and installing Vector Linux is a snap and my basic setup came in at around 360MB. I used /dev/hde (the drive connected to the primary controller on the first card) as the root device, with a /boot (64MB /dev/hde1) and / (the rest of the disk /dev/hde2) partition. No fancy partitioning here as this whole project is a test to see how badly I can mess things up with 'mdadm' and RAID5 as a root device. Most default installs have RAID built-in to the kernel. If you don't - start building.
***
A few notes on add-on controllers are worthy of mention here. Generally off-board IDE/ATA controllers start with /dev/hde, as most x86 boards have a primary and secondary controller that will chew up hda -> hdd. On most installs the kernel will ignore the BIOS boot report and probe the board for hardware. This means that even if you have disabled onboard controllers (IDE0 and 1), the kernel will still pick it up, thus making your offboard controllers start at /dev/hde. There are kernel boot parameters that can be fussed with (ide=reverse) but it's dependant on the kernel setup.
***
As for filesystem type, I'm a reiserfs addict so everything but boot (ext2) will be reiserfs formatted. Now that I have installed and setup Vector Linux on /dev/hde it's time to move on and see what this thing can do.

Some RAID-specific setup After rebooting and confirming thigs are what they say they are one last check revealed that I forgot to set the / partition type to 'fd' - "Linux raid autodetect" so I needed to run fdisk /dev/hde and change the type (option 't' from the menu) of the partition that will be a member of the future RAID array. Write that information to disk and exit fdisk. We have a few more things to take care of while we're at it. Our other participants (/dev/hdg, /dev/hdi, /dev/hdk) need to be partitioned as well and it's also time for another little sidebar.
***
Notice that the drive naming skips /dev/hdf, /dev/hdh, /dev/hdj, as they would be slave drives on the primary and secondary device (on IDE2 and 3). Judging from my reading of mailing lists and other goodies it's a generally accepted rule that you should avoid using any slave drives in a RAID array. You controller can talk nicely to a single drive on a primary and secondary controller with little diffuculty, but adding a slave to the chain can choke the array as each drive must wait while its partner reads/writes. As to whether it's just pop mythology or undeniable fact I can't say fo sure. Any confimations or denials will be graciously accpeted.
***
On with the show. we need to at least create the / partition and optionally the /boot partition on the remaining free devices. This document is merely to demonstrate how to build a simple array. A /boot partition can be used on each disk and built into a very fault-tolerant RAID1 device which can be booted by the kernel easily. This is left as an excersize for the reader. To keep things simple we'll do exactly the same partitioning on the reamining 60GB drives and a minor cheat on the 80GB device. On the 80GB I chose to be lazy and rather than match sizes and cylinders I just made the /boot partition 65MB (/dev/hdk1) and the RAID (/dev/hdk2)partition a couple of MB larger just to make sure things fit easily enough. (mdadm and raidtools will complain about size differences and adjust accordingly and within reason, but I'd rather lose 2MB on one disk than three). I also used another 256MB (/dev/hdk3) as swap space. The remaining 19 or so gigs can be used for emergency purposes (real handy when /var/spool/mail has 0Kb available)
Generally it's supposed to be safe to run something like 'sfdisk -R /dev/hdX' (erm... substitute the letter "X" for the device to be re-read. Don't make me come over there!) to doulble-check that the kernel re-reads the new partition but since I'm in no particular hurry I'll just reboot to be on the safe side.

RAID setup Allrighty then. Things are back up and now we finally get to play with mdadm. What's that? You didn't install it? Then go grab a recent package for your distro or source tarball here. After you're all geared up we just need to cover a few things before we dive into this mess. First a review of the partitions we'll be using. We have four RAID devices

Code:

[list=1][*]/dev/hde2 (our current / partition, type 'fd') [*]/dev/hdg2 unformatted, type 'fd'[*]/dev/hdi2 ditto[*]/dev/hdk2 ditto[/list=1]

From the man pages we learn that to create the array we need somthing like this

Code:

mdadm --create /dev/md0 --chunk=64 --level=5 --raid-devices=4 /dev/hd[gik]2 missing

WTF is that? Well you should RTFMpages, but in short we're telling mdadm to create a RAID5 array, with a chunk-size of 64K (the default), we have four participants, /dev/hde2, /dev/hdg2, and /dev/hdk2. The missing statement tells mdadm to build a degraded array leaving space for the mising drive. We can do this as RAID5, with shared parity striped on all members, allows for one missing member (hence the redundancy). Unlike the 'failed-disk=' directive used in the raidtools configuration, mdadm doesn't care which member isn't avialble, it just needs to know that someone's not home and to build accordingly. We want to create the array, but we don't want to destroy our fresh Vector Linux install on /dev/hde2! With the above statement we're building a degraded array with three out of four drives, meaning that one disk has already failed. If all went well you should see something like this when you run 'cat /proc/mdstat'

Code:

md0 : active raid5 hdi2[3] hdg2[2] hdk2[1]
      117140480 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]

.

Now you have RAID5, so whatcha gonna do with it?

Boot RAID5 on / Now we can format the array with your fs of choice. I like reiserfs so let's try 'mkreiserfs /dev/md0'. That should let you format the new array. A few seconds and it'll be ready for your stuff. A while back I found a nifty snippet from one of the RAID How-To's that I can't seem to locate anymore. If someone spots it in the wild please let me know so I can give credit where it's due.

Mount the degraded RAID device somewhere handy, like /mnt. 'mount /dev/md0 /mnt' for the lazy folks like me who depend on the kernel knowing more about the formats than I do. now 'cd' to the / directory. Punch in 'find . -xdev | cpio -pmv /mnt'. You should read up on the find and cpio options as a) how much do you trust me? and b) you might learn some more neat tricks. While it's copying you'll see a bunch of filenames flying by (or dots if you used -V). Once all is complete (all that text breezing by comes to a halt and you see something like "9977223 blocks") you need to edit the fstab on the RAID5 device (/dev/md0) which should now be living in /mnt/etc/fstab. You need to tell your system that the / partition is now going to be on /dev/md0 - a line like "/dev/hdk3 / reiserfs defaults 0 1" should suffice, and you should remove your original reference to / as well. Next you'll need to edit /etc/lilo.conf and tell the bootloader to use /dev/md0 as the root device. A simple lilo block would look something like this

Code:

boot=/dev/hde
image = /boot/vmlinuz
root = /dev/md0
label = Linux
read-only

will install lilo to the MBR on the first drive on the first controller (this was how my install defaulted) and tells lilo to use the RAID array as the root device. Now run '/sbin/lilo.conf' and ensure there are no bootloader complaints. Note also that when you reboot the new md0 root device that your lilo.conf will be the one you copied originally in the 'find...' command issued a few steps ago, so it will show your _old_ config (root=/dev/hde2). Change this and re-run lilo.

So now what? Now you should have a working, albeit degraded RAID5 array mounted as /, but what about the failed drive? You need to add this device to the new array and have it resynch. I've dragged you along this far, so perhaps now is a good time to read the man pages for 'mdam'. Relax. Don't panic. It's actually pretty simple. That's why this, too is left as an excersize for the reader.

Cheers,
--DMc

mcleodnine · 11-27-2003, 05:19 AM

Quote:

Originally posted by Vlad_M
and there is a mailing list for the new mdadm tools on which the author of mdadm, Neil Brown, answers most of the questions people have. So there are ample resources either way.

If it is a mission critical system, really look at SCSI drives. In my (humble) opinion, mdadm is robust enough to be deployed on a mission critical system. It goes without saying that nightly (tape or DVD) backups are a must for ANY kind of MC system, however.

From what I have read over the last couple of weeks I have noted that the mdadm list people and Neil Brown in particular are a refreshingly patient lot. Due to the low-key distribution and promotion of mdadm I was expecting a crowd similar to a well-known, and feared MTA user list.

And yeah I have to agree that SCSI is the way to go if relaiability is high on the list as the drives are higher-quality and the speed is enough to make me weep. It's not just transfer speeds, but things like command queueing and disconnects that really make it fly. Our old 10Krpm Seagate Cheetahs (1999!) kick all our new ATA stuff around the block.

And by all means read the How-To's at tldp and various maintainer's sites as they cover a lot of ground regarding technology and procedures that actually help make the why's and what's of RAID make sense.

Vlad_M · 11-27-2003, 07:22 AM

mcleodnine, where where you with a post like this when I needed help ???

Simply outstanding, mate, this should be stickied, seeing that yer a mod why don't you just do it, or make it a part of some guide or something... I'd just hate to see it get swamped by other threads.

jbstew32 · 11-27-2003, 11:43 AM

shit i am glad i am a member of this forum

thanks

mcleodnine · 11-27-2003, 03:36 PM

Well let's see if the doc is actually of any practical use to anybody first

Any errata or reports of success/failure with the above doc should be added to this thread so that we can tune it up into a nice LA entry one day.