A lot of thanks an respect to Software-RAID HOWTO (
http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html) of Jakob Ostergaard and Emilio Bueso. This HOWTO is a really in-depth document of about 40 pages ! My document is more for the unpatient sysadmin ...
Server Hardware, Kernel:
Kernel 2.6.21
Pentium Dual Core 2,66MHZ
4GB RAM
2 SATA-Drives 400GB
Boot machine with Slackware DVD and get on the console
Partioning both drives exactly the same way. I don't see the advantage by creating lot's of partitions. So I created a primary partition of 380GB (sda1, sdb1) and a swap partition (sda2, sdb2) of 20GB on each drive. Note the correct partition types: FD (linux raid autodetect) for sda1, sdb1. Type 82 as usual for the swap partitions.
If there was a former LILO or other bootloader installed in the MBR of the discs, wipe it out with:
dd if=/dev/zero of=/dev/sda bs=446 count=1
dd if=/dev/zero of=/dev/sdb bs=446 count=1
Forget about the old raidtools and raidtab files. Use the mdadm-utility (multiple discs array admin manager). A really fine tool with lots of options (man mdadm or mdadm --help). To create the array type:
mdadm –create /dev/md0 –level=1 –raid-devices=2 /dev/sda1 /dev/sda2
If there is already a filesystem on the partition(s), mdadm will ask you if you really want to proceede. Answer 'y'. You get an output like:
mdadm: array /dev/md0 started
With 'cat /proc/mdstat' you can always see the status of your array. Or use 'mdadm –detail /dev/md0' instead, which gives you nearly the same information.
By examine the outprinted information you recognize that the syncing process between the two discs (partitions) has started immediately. DON'T INTERRUPT THIS PROCESS UNTIL IT HAS FINISHED !
If the device was successfully created, the reconstruction process has now begun. Your array is not consistent until this reconstruction phase has completed. However, the array is fully functional (except for the handling of device failures of course), and you can format it and use it even while it is reconstructing.
Now you can put a filesystem on /dev/md0. I prefer ext3, but this doesn't matter. If you are an expert and want fine tuning, use mke2fs on the console. Otherwise you can type 'setup' now to enter the Slackware configuration utility.
Proceed through the configuration points as usual, exept LILO-Installation. By setting up your target partitions, you see the /dev/md0 device which MUST be setup as Root-partition.
LILO
Newer LILO distributions can handle RAID-1 devices, and thus the kernel can be loaded at boot-time from a RAID device. LILO will correctly write boot-records on all disks in the array, to allow booting even if the primary disk fails.
Some users have experienced problems with this, reporting that although booting with one drive connected worked, booting with both two drives failed. Nevertheless, running the described procedure with both disks fixed the problem, allowing the system to boot from either single drive or from the RAID-1 (this is what I did, too – changing lilo.conf and installed in MBR of both discs).
The boot device MUST be a non-raided device. The root device is your new md0 device. I did not test installing LILO in the superblock of the array. In my opinion it should work also.
Example:
boot=/dev/sda
install=/boot/boot.b
prompt
timeout=50
message=/boot/message
default=linux
image=/boot/vmlinuz
label=linux
read-only
root=/dev/md0
Enter the LILO-Configuration in Expert-Mode. Go through the steps and after you have done view the lilo.conf file very seriously, if the boot and root entries are like explained.
If everything is OK, install LILO
Be very patient. The synchronisation-process is painfully slow. For the two 400GB-Discs it took over two hours ! Read on to speed it up.
Parannoia:
Unmount /dev/md0
Stop your array with mdadm -S /dev/md0
Reboot – Everything should work perfect -
Speeding up Synchronisation
If you are in a situation where you sit in front of the console (or on a remote ssh connection) waiting for a Linux software RAID to finish rebuilding (either you added a new drive, or you replaced a failed one, etc.) then you might be frustrated by how slow this process is running. You are running cat on /proc/mdstat repeatedly (you should really use watch in this case
), and this seems to never finish… Obviously that there is a logical reason for this ‘slowness‘ and on a production system you should leave it running with the defaults. But in case you want to speed up this process here is how you can do it. This will place a much higher load on the system so you should use it with care.
To see your Linux kernel speed limits imposed on the RAID reconstruction use:
cat /proc/sys/dev/raid/speed_limit_max
200000
cat /proc/sys/dev/raid/speed_limit_min
1000
In the system logs you can see something similar to:
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
This means that the minimum guaranteed speed of the rebuild of the array is approx 1MB/s. The actual speed will be higher and will depend on the system load and what other processes are running at that time.
In case you want to increase this minimum speed you need to enter a higher value in speed_limit_min. For example to set this to approx 50 megabytes per second as minimum use:
echo 200000 >/proc/sys/dev/raid/speed_limit_min
The results are instant… you can return to the watch window to see it running, and hope that this will finish a little faster (this will really depend on the system you are running, the HDDs, controllers, etc.):
watch cat /proc/mdstat
Hardcore-Testing
After all was set up, I was curious how stable this thing is. So I made clean shutdowns, pulled of the cable of first sda, then sdb disc. Finally I pulled the AC connection without a proper shutdown (puhhhh!!).
mdadm /dev/md0 -a /dev/sdX hot-adds the degraded disc back into the array.
It work's. Not to damage so far ! Now I'm very shure, that I can rely on that system.