Best way to RAID1 the boot drive
Howdy all, I am a Linux n00b (oh noes) =P
I am self teaching everything I need to develop a home-based web server (linux/apache/php/mysql/html/css/etc...) It's quite an undertaking, but not beyond my abilities. I thought this question could have gone in either the linux - software or linux - hardware forum, and certainly not in the n00b section, but I figured it's best be put in the linux - server forum, since that's what this is related to.
I have been looking into the software and hardware RAID solutions for linux because I wanted to make sure that the boot drive of the web server I set up is mirrored with transparent disk fail/replace/recovery. I mean, setting up a boot drive for RAID1 sounded perfectly logical to me, and why wouldn't it to anybody else? So, since I knew RAID controllers were expensive, I looked into the native software RAID support in linux. My findings have revealed an issue with software raiding a boot drive in not only linux but windows as well. Apparently, if the primary drive fails (not the mirror), you have no other option but to power down the system to properly replace the failed disk, reboot, play some config crap, resync the drive, do some more config crap, reboot again, and -hopefully- it'll be ok. Well, that procedure is simply out of the question since the idea behind RAID is to transparently proceed as if nothing happened.
I'd like to know if it's even possible to RAID1 the boot drive for transparent and automatic fail/hot-swap/recover WITHOUT rebooting the system and with no intervention on my part other then replacing the drive whether it be a software raid or hardware raid solution.
Eventually, what I'd like to do for a drive configuration is have 3 RAID volumes on the server configured like so:
RAID volume 1 = boot drive w/ webserver installed
RAID volume 2 = database files
RAID volume 3 = flatfile storage
Each raid volume will be a RAID1 of a 1TB drive (total = 6 x 1TB drives)
I've seen a lot of people having failure issues with the software RAID in these forums. Is this more common than not? I'm certainly not opposed to buying a hardware RAID solution as long as they're reliable and provide transparent/automatic recovery.
So what's the best way to RAID1 the boot drive for transparent/automatic failover?
hardware vs software raid
Just to be the first to respond to a thread:
hardware raid is probably better because you have dedicated hardware (to control the reads and writes and to have caching and small battery for failures like power off, so as to not corrupt any mid-write action, and just plain old faster), and you can probably find a cheaper hardware raid without a battery and for SATA drives (e.g. search "rocketraid" on newegg.com). Setup is done pre-boot; check out your [would-be] new card's manual.
but software raid works (i.e. will provide the redundancy that you want so as to feel better and give you a chance to swap out a hard drive when it goes bad, because it will since it has moving parts); assuming /dev/sda has the stuff on it (and swap space) and you want to mirror it with /dev/sdb; at least, this is what I did:
sudo apt-get install mdadm
sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
sudo fdisk /dev/sdb
# then at the "Command (m for help):" prompt, type "t" (for change
# partition Table) and Enter, then "1" and Enter at the "Partition number
# (1-4):" prompt, then "fd" and Enter at the "Hex code (type L to list
# codes):" prompt (for "Linux raid auto"). Then repeat, but "t", "2", and
# "fd" (i.e. as we have two partitions: root (or /dev/sda1) and swap (or
# /dev/sda2); and finally, type "w" and Enter at the "Command (m for
# help):" propmt (to write table to disk and exit).
sudo shutdown now -r
sudo mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/sdb1
sudo mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/sdb2
sudo mkfs.ext3 /dev/md0
sudo mkswap /dev/md1
sudo cp /etc/mdadm/mdadm.conf /etc/mdadm/mdadm.conf.old
sudo mdadm --examine --scan >> /etc/mdadm/mdadm.conf
sudo mkdir /mnt/md0
sudo mount /dev/md0 /mnt/md0
sudo pico /etc/fstab
# replace "/dev/sda1" with "/dev/md0" and replace "/dev/sda2" with
# "/dev/md1", Ctrl "x", "y", Enter
sudo pico /etc/mtab
# replace "/dev/sda1" with "/dev/md0", Ctrl "x", "y", Enter
sudo pico /boot/grub/menu.lst
# add a new line "fallback 1" after "default 0", then scroll to the
# bottom and while on the *first* "title" line (where the next three lines
# start with "root", "kernel", and "initrd"), hold Ctrl and type "k" five
# times, then Ctrl "u" twice (to copy that first kernel listing), then edit
# the first set of lines (i.e. so that the original set of lines we copied
# will be preserved as the second set or "fallback 1"): replace "(hd0,0)"
# with "(hd1,0)" in the "root" line and replace "/dev/sda1" with "/dev/md0"
# in the "kernel" line, and then add "(hd1)" to the title of the first
# entry and "(hd0)" to the second entry's title, Ctrl "x", "y", Enter
sudo cp -dpRx /boot/. /mnt/md0
# type "setup (hd1)" and Enter at the "grub>" prompt, then "quit"
sudo shutdown now -r
sudo fdisk /dev/sda
# then type "t" and Enter, "1" and Enter, "fd" and Enter, then "t", "2",
# and "fd"; and finally, type "w" and Enter.
sudo shutdown now -r
sudo mdadm --add /dev/md0 /dev/sda1
sudo mdadm --add /dev/md1 /dev/sda2
sudo cat /proc/mdstat
# and when it's done building (i.e. "UU" and not "_U" for the two drives):
mdadm --examine --scan >> /etc/mdadm/mdadm.conf
sudo pico /boot/grub/menu.lst
# find the line "kopt=root=/dev/sda1 ro" and replace "sda1" with "md0" (but
# don't remove the "#" at the beginning of the line), then scroll down to
# the bottom and comment out (i.e. add "#" to the beginning of the line)
# the original two entries, then copy the top/new entry and paste it but
# make one "root" have "(hd1,0)" and add "RAID" to the "title" and the
# other "root" be "(hd0,0)" and add "RAID" to the "title"m, and both
# entries' "kernel" contain "/dev/md0", then Ctrl "x", "y", Enter
sudo shutdown now -r
If this is true, then the choice is simply a matter of deciding if I want to use system resources(cpu/ram) to manage the RAID or a dedicated card which wouldn't affect system resources at all.
BTW, have you simulated a fail on your main drive to check if your RAID setup works properly? Have you been able to successfully hot swap your boot drive without taking the system down? For that matter, I'd like to know if ANYONE has done this successfully on software RAID.
software raid requires rebooting
Others can certainly chime in, but in my experience, rebooting seems to be unavoidable with the software raid (and I recall, but not 100% certain, that the fdisk utility even throws some you-should-reboot kind of warning when changing things around).
I did have a drive fail (which was failed for a few days before I noticed, forget which log said what, but "mdadm --detail /dev/md0" or look at /proc/mdstat). I rebooted a couple times to get everything back up (but what's a couple reboots compared to rebuilding a system?). I remember taking out a drive at the beginning of my adventure, but again, turned off to do it and didn't avoid reboots.
Also, the hot swap seems weird without hardware; i.e. I mean, it's the machine that determines whether it can just handle a drive being unplugged and then plugged back in without some software effect, which I think most machines would just not see the drive until the next reboot (as opposed to some halting!). Try searching "raidhotadd"...
Sorry, not an expert; just trying to share information because I seek information.
I looked up raidhotadd/raidhotremove and they appear to be the solution, but not without their own issues of course. I'll have to research those some more.
I've also been eyeing up a "3ware" controller. A tad pricey, but states that it has support for linux and also has options for hot-swap even on a boot drive. They've got some favorable opinions from people in general, so I plan on contacting them to see what the card can really do, plus see what drive cages support their LED alerts.
I guess one of the reasons I thought software RAID could do hot-swap automatically is because we use a NAS device as our SOHO server in the office. We had drive failures on it, and the replacement worked without a hitch. The log files on the NAS device indicated some reports by mdadm, so I thought everything would be fine in software raid. In light of these findings, I think the NAS device might be booting its linux to a ramdrive, thus the disks may not even be boot drives (the manual said you can't boot the NAS without disks, so I was under the impression that linux was ultimately installed on the drives). Of course, there's also the possibility that the NAS device has a proprietary RAID controller too, but when I saw "mdadm" listed in the log, I was sure it was a software raid setup... Of course, that's just guessing on my part... I dunno EXACTLY how the NAS is set up internally, only that its raid system works.
|All times are GMT -5. The time now is 02:13 AM.|