RAID is a storage strategy to increase data access speed and data protection. There are a few different "levels" of RAID. The different levels employ different strategies and it's a little too lengthy to discuss each one here. Searching on Google ought to give more information than you'd ever want. I'll discuss RAID 5 though.
RAID 5 requires 3 or more identical disks. Each disk is generally separated into "stripes" -- one stripe on each drive for each drive in the array (e.g. a RAID 5 array with 7 disks would have 7 stripes on
each physical disk). One stripe on each drive is reserved for parity data--the other 9 can hold usable data.
The practical effect is two-fold: the amount of actual data storage for the array is "reduced" by one disk. In the example I gave earlier, assume that each disk's capacity was 10 GiB. The total data capacity in the array would be (7 disks - 1 disk for parity) * 10 GiB = 60 GiB.
The second effect is spreading the data across the disks. By spreading the data across the stripes, any read or write to the array splits the work between multiple drives. The disks operate in parallel, making the data movement occur in parallel. Having multiple drives move separate pieces concurrently is faster than one single drive moving the whole chunk by itself.
That explains the data access speed. The data protection comes from the parity. If you're not familiar with parity, it basically just counts the number of 1's in a binary stream of data. If the number of 1's is odd, then parity is a 1. If the number of 1's in the binary stream is even, then parity is a 0. So, the parity stripe on a given disk contains the parity of the corresponding stripes on the
other physical disks. To further the example I gave earlier:
Code:
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6
|------| |------| |------| |------| |------| |------| |------|
| P0-0 | | D1-0 | | D2-0 | | D3-0 | | D4-0 | | D5-0 | | D6-0 |
|------| |------| |------| |------| |------| |------| |------|
| D0-1 | | P1-1 | | D2-1 | | D3-1 | | D4-1 | | D5-1 | | D6-1 |
|------| |------| |------| |------| |------| |------| |------|
| D0-2 | | D1-2 | | P2-2 | | D3-2 | | D4-2 | | D5-2 | | D6-2 |
|------| |------| |------| |------| |------| |------| |------|
| D0-3 | | D1-3 | | D2-3 | | P3-3 | | D4-3 | | D5-3 | | D6-3 |
|------| |------| |------| |------| |------| |------| |------|
| D0-4 | | D1-4 | | D2-4 | | D3-4 | | P4-4 | | D5-4 | | D6-4 |
|------| |------| |------| |------| |------| |------| |------|
| D0-5 | | D1-5 | | D2-5 | | D3-5 | | D4-5 | | P5-5 | | D6-5 |
|------| |------| |------| |------| |------| |------| |------|
| D0-6 | | D1-6 | | D2-6 | | D3-6 | | D4-6 | | D5-6 | | P6-6 |
|------| |------| |------| |------| |------| |------| |------|
In the fantastically cool diagram above, a 'D' represents a data stripe and a 'P' represents a parity stripe. The first digit is the disk number. The second digit is the stripe number. So, P0-0 contains the parity data for the stripes on that "row" (or rather D1-0, D2-0, D3-0, D4-0, D5-0, and D6-0). It's similar for all the other parity stripes.
Ok, so this gives data protection because it guards against a drive failure. That is, assume one of the disks above fails for some reason. The array
can still operate because of the parity information. The computer can calculate the data that
was on the failed drive by looking at the data on the other stripes and comparing it to the parity. This allows the administrator time to replace the failed drive without losing critical uptime. When the drive is replaced, the array "rebuilds" the failed drive.
Two things though: when a drive fails, the data access benefit is lost because the computer has to calculate parity. Also, an array cannot sustain two drive failures. At least that's better than total data loss/downtime that would result with just a single drive. There are some array controllers that allow for multiple drive failures without data loss, but most only handle one.