This method works, however we are using a mirror of 2 md devices (i.e. two RAID0 stripes one on flash cards and one on disk, with the disk being marked write-mostly). Functionally it all works, but stacked md configurations are very slow. Reading through the mirror offers only about 50% of the bandwidth of reading the RAID0 stripe directly. This is true even for the disk side by itself, with the flash side removed (i.e. a mirror with one side failed).
Wondering how to create a md array that starts out missing a piece? use "missing" instead of the disk e.g.
mdadm --create /dev/md2 -l 1 -n 2 /dev/md1 "missing"
will create the md2 volume used below
here are the details
Code:
[root@pe-r910 ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Tue Jul 26 23:13:59 2011
Raid Level : raid1
Array Size : 1998196216 (1905.63 GiB 2046.15 GB)
Used Dev Size : 1998196216 (1905.63 GiB 2046.15 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Thu Jul 28 08:29:35 2011
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : pe-r910.ingres.prv:2 (local to host pe-r910.ingres.prv)
UUID : 299ea821:756847a0:4db591e4:38769641
Events : 160
Number Major Minor RaidDevice State
0 9 1 0 active sync /dev/md1
1 0 0 1 removed
[root@pe-r910 ~]# mdadm --detail /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Tue Jul 26 01:05:05 2011
Raid Level : raid0
Array Size : 1998197376 (1905.63 GiB 2046.15 GB)
Raid Devices : 14
Total Devices : 14
Persistence : Superblock is persistent
Update Time : Tue Jul 26 01:05:05 2011
State : clean
Active Devices : 14
Working Devices : 14
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
Name : pe-r910.ingres.prv:1 (local to host pe-r910.ingres.prv)
UUID : 735bd502:62ed0509:08c33e15:19ae4f6b
Events : 0
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
5 8 97 5 active sync /dev/sdg1
6 8 113 6 active sync /dev/sdh1
7 8 129 7 active sync /dev/sdi1
8 8 145 8 active sync /dev/sdj1
9 8 161 9 active sync /dev/sdk1
10 8 177 10 active sync /dev/sdl1
11 8 193 11 active sync /dev/sdm1
12 8 209 12 active sync /dev/sdn1
13 8 225 13 active sync /dev/sdo1
[root@pe-r910 ~]# dd if=/dev/md1 bs=512K count=10000 iflag=nonblock,direct of=/dev/null
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 3.45236 s, 1.5 GB/s
[root@pe-r910 ~]# dd if=/dev/md2 bs=512K count=10000 iflag=nonblock,direct of=/dev/null
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 6.81182 s, 770 MB/s
[root@pe-r910 ~]#
update:
iostat shows 64K reads being done both do md1 and to its component devices when reading directly from md1. This is somewhat mysterious as dd is asking for 512k reads. So I would have expected to see 512k to md1, and 64K to its component devices (i.e. the chunk size).
But the killer is that when reading from md2 (the raid1 volume with only one half present) it shows only 4k reads to md2, md1, and the component devices. Perhaps that is due to md thinking that is the size it should use for error processing, but its killing performance.
update2:
This looks only to be an issue for md on md. If I make a raid1 directly on a disk, its io rate is the same as the disk.