LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   software RAID10 - does the disk order in mdadm matter ? (http://www.linuxquestions.org/questions/linux-server-73/software-raid10-does-the-disk-order-in-mdadm-matter-671016/)

banan.olek 09-19-2008 06:56 AM

software RAID10 - does the disk order in mdadm matter ?
 
Hello,

the querstion is related to md software implementation of RAID10

I have the server with two SCSI adapters. On the first adapter (host4) I can see two disks /dev/sdb1 /dev/sdc1, on the second adapter (host6) I can see other two disks /dev/sdd1 and /dev/sde1.

I would like to create RAID10 across four disk, but I can not find any information how to enforce placement of the mirror on disks connected to second controler to ensure that in case of failure of entire second controler (bus) my RAID10 will be accessable.

I have made some tests and I have found that the order of disks I specified in mdadm does metter, but I can not find any description of mirror allocation rule.

I assume that command

mdadm --create --raid=10 --raid-devices /dev/md0 /dev/sdb1 /dev/sdd1 /dev/sdc1 /dev/sde1

will create 2 mirrors first between /dev/sdb1 (host4) and /dev/sdd1 (host6) and second between /dev/sdc1 (host4) and /sev/sde1 (host6). Then the stripe across those mirrors will be created.

The point is, do you know how the RAID10 is created or have any manual which describe where the mirror copy is written by MD ?

thanks for any info

Olek

sebstar 09-19-2008 09:29 AM

RE: software RAID10 - does the disk order...
 
I use a combination of LVM and MDADM to create my raid structure and the drives order often changes and have had no problems with it. I am able to lose any two LUNS and they get rebuilt when I have them back online. I used this doc as a refference:
https://help.ubuntu.com/community/In...tion/RAID1+LVM

I hope this helps.

banan.olek 09-20-2008 08:27 AM

still looking for documentation how RAID10 choose disk for mirror
 
yes, two RAID1 + LVM stripes is one of the possible workarounds. The second is: two RAID1 devices and third RAID0 using previous ones. Both of them are ok, but the question is: why do we have to use workarounds if RAID10 is available in mdadm.

I am still looking for a whitepaper, man page or any other documentation which describes how Software RAID10 choose disk for mirror copy. As I mentioned before I've made a test which clearly shows that disk order in "mdadm --create" command does matter (See TEST CASES below)
I am Linux newbie and study of source code is rocket science for me, so maybe any one of you can help me.

--==## DESCRIPTION OF THE TESTS I DID. ##==--
I have made the test VMWare box. The SCSI disk setup is described below
sdb1 sdc1
+----------------------+ | |
| SCSI bus 1 () ===============
| |
| SCSI bus 2 () ===============
+----------------------+ | |
sdd1 sde1

#####################
#### TEST CASE 1 ####
#####################

STEP1
Lets, create RAID10 device specifying disk in following order
(1) sdb1, (2) sdc1, (3) sdd1, (4) sde1

#> mdadm --create --bitmap=internal --level=raid10 --raid-devices=4 /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
#> mkfs.ext3 /dev/md0

STEP2
check the filesystem
#> fsck.ext3 -f -n /dev/md0
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 11/51200 files (9.1% non-contiguous), 12127/204544 blocks

STEP3
Lets fail sdd1 and sde1 = it simulates the failure of entire SCSI bus 2

#> mdadm --fail /dev/md0 /dev/sdd1 /dev/sde1
now the /dev/md0 is running on /dev/sdb1 and /dev/sdc1


STEP4 - RESULTS
filesystem is DEAD
#> fsck.ext3 -f -n /dev/md0
e2fsck 1.39 (29-May-2006)
e2fsck: aborted
Error reading block 64 (Attempt to read block from filesystem resulted in short read). Ignore error? no

Resize inode not valid. Recreate? no

Pass 1: Checking inodes, blocks, and sizes
Error reading block 64 (Attempt to read block from filesystem resulted in short read) while doing inode scan. Ignore error? no

Error while iterating over blocks in inode 7: Attempt to read block from filesystem resulted in short read

#####################
#### TEST CASE 2 ####
#####################

This time we create RAID10 device specifying different disk order
(1) sdb1, (2) sdd1, (3) sdc1, (4) sde1

#> mdadm --create --bitmap=internal --level=raid10 --raid-devices=4 /dev/md0 /dev/sdb1 /dev/sdd1 /dev/sdc1 /dev/sde1
#> mkfs.ext3 /dev/md0

STEP2
check the filesystem
#> fsck.ext3 -f -n /dev/md0
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 11/51200 files (9.1% non-contiguous), 12127/204544 blocks

STEP3
Lets fail once again the same disks: sdd1 and sde1 = it simulates the failure of entire SCSI bus 2

#> mdadm --fail /dev/md0 /dev/sdd1 /dev/sde1
now the /dev/md0 is running on /dev/sdb1 and /dev/sdc1


STEP4 - RESULTS

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!! THE FILESYSTEM IS OK !!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

#> fsck.ext3 -f -n /dev/md0
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 11/51200 files (9.1% non-contiguous), 12127/204544 blocks

banan.olek 09-20-2008 11:26 AM

RAID10 - solved
 
I finally have the answer.
The way the mirror copy disks are assigned is very well described here:
http://www.issociate.de/board/post/3...hich_one?.html

and here:

http://www.novell.com/documentation/...admr10cpx.html

It was all about --layout option in mdadm which is n2 by default. n2 = NEAR layout with 2 replicas.

According to Novell documentation:
Near Layout
With the near layout, copies of a block of data are striped near each other on different component devices. That is, multiple copies of one data block are at similar offsets in different devices. Near is the default layout for RAID10. For example, if you use an odd number of component devices and two copies of data, some copies are perhaps one chunk further into the device.
The near layout for the mdadm RAID10 yields read and write performance similar to RAID 0 over half the number of drives.
Near layout with an even number of disks and two replicas:
sda1 sdb1 sdc1 sde1
0 0 1 1
2 2 3 3
4 4 5 5
6 6 7 7
8 8 9 9
Which shows exactly the same behavior I noticed during my tests.

Now it is obvious, how it works :)

Sergey_Ryzh 09-06-2011 01:57 PM

Good job, Banan.olek.
Thanks for your research.


All times are GMT -5. The time now is 03:16 PM.