LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 07-04-2011, 06:43 PM   #1
jg167
Member
 
Registered: Jun 2004
Posts: 40

Rep: Reputation: 15
md policy on device selection for RAID-1


Does anyone know off hand how md selects which device to read from for RAID-1? Does it try for good performance by just distributing them round-robin fashion, or does it compute some sort of avg-service-time * q-length to guess where the read will finish first, or?

It looks like this choice can be influenced by
echo "writemostly" > /sys/block/md<n>/md/dev-<xxx>/state
Which according to md.txt will cause reads to go to this device only if there is no other choice. This would do just what I want in a RAID-1 of a fast and slow device, where its mostly a read only system.

has anyone tried that?

Last edited by jg167; 07-04-2011 at 06:51 PM. Reason: add link to md.txt
 
Old 07-05-2011, 01:07 AM   #2
nooneknowme
Member
 
Registered: Feb 2008
Location: Bangalore, India
Posts: 69

Rep: Reputation: 5
I have not tried that, but the issue sounded interesting. Reading the manual I found an option which might do the trick for you

Quote:

For Manage mode:
--write-mostly
Subsequent devices that are added or re-added will have the
'write-mostly' flag set. This is only valid for RAID1 and means
that the 'md' driver will avoid reading from these devices if
possible.

--readwrite
Subsequent devices that are added or re-added will have the
'write-mostly' flag cleared.
You have similar options while creating the raid array as well.
 
Old 07-06-2011, 09:01 PM   #3
jg167
Member
 
Registered: Jun 2004
Posts: 40

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by nooneknowme View Post
I have not tried that, but the issue sounded interesting. Reading the manual I found an option which might do the trick for you



You have similar options while creating the raid array as well.
Thanks, that'l make it even easier to setup.
 
Old 07-28-2011, 10:54 AM   #4
jg167
Member
 
Registered: Jun 2004
Posts: 40

Original Poster
Rep: Reputation: 15
functionally it works, but performance is terrible

This method works, however we are using a mirror of 2 md devices (i.e. two RAID0 stripes one on flash cards and one on disk, with the disk being marked write-mostly). Functionally it all works, but stacked md configurations are very slow. Reading through the mirror offers only about 50% of the bandwidth of reading the RAID0 stripe directly. This is true even for the disk side by itself, with the flash side removed (i.e. a mirror with one side failed).

Wondering how to create a md array that starts out missing a piece? use "missing" instead of the disk e.g.
mdadm --create /dev/md2 -l 1 -n 2 /dev/md1 "missing"
will create the md2 volume used below

here are the details

Code:
[root@pe-r910 ~]# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Tue Jul 26 23:13:59 2011
     Raid Level : raid1
     Array Size : 1998196216 (1905.63 GiB 2046.15 GB)
  Used Dev Size : 1998196216 (1905.63 GiB 2046.15 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jul 28 08:29:35 2011
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : pe-r910.ingres.prv:2  (local to host pe-r910.ingres.prv)
           UUID : 299ea821:756847a0:4db591e4:38769641
         Events : 160

    Number   Major   Minor   RaidDevice State
       0       9        1        0      active sync   /dev/md1
       1       0        0        1      removed
[root@pe-r910 ~]# mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Tue Jul 26 01:05:05 2011
     Raid Level : raid0
     Array Size : 1998197376 (1905.63 GiB 2046.15 GB)
   Raid Devices : 14
  Total Devices : 14
    Persistence : Superblock is persistent

    Update Time : Tue Jul 26 01:05:05 2011
          State : clean
 Active Devices : 14
Working Devices : 14
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 64K

           Name : pe-r910.ingres.prv:1  (local to host pe-r910.ingres.prv)
           UUID : 735bd502:62ed0509:08c33e15:19ae4f6b
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      active sync   /dev/sdh1
       7       8      129        7      active sync   /dev/sdi1
       8       8      145        8      active sync   /dev/sdj1
       9       8      161        9      active sync   /dev/sdk1
      10       8      177       10      active sync   /dev/sdl1
      11       8      193       11      active sync   /dev/sdm1
      12       8      209       12      active sync   /dev/sdn1
      13       8      225       13      active sync   /dev/sdo1
[root@pe-r910 ~]# dd if=/dev/md1 bs=512K count=10000 iflag=nonblock,direct of=/dev/null
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 3.45236 s, 1.5 GB/s
[root@pe-r910 ~]# dd if=/dev/md2 bs=512K count=10000 iflag=nonblock,direct of=/dev/null
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 6.81182 s, 770 MB/s
[root@pe-r910 ~]#
update:
iostat shows 64K reads being done both do md1 and to its component devices when reading directly from md1. This is somewhat mysterious as dd is asking for 512k reads. So I would have expected to see 512k to md1, and 64K to its component devices (i.e. the chunk size).
But the killer is that when reading from md2 (the raid1 volume with only one half present) it shows only 4k reads to md2, md1, and the component devices. Perhaps that is due to md thinking that is the size it should use for error processing, but its killing performance.

update2:
This looks only to be an issue for md on md. If I make a raid1 directly on a disk, its io rate is the same as the disk.

Last edited by jg167; 07-29-2011 at 02:20 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Selection Buffer sending text before completing selection fakie_flip Linux - Software 2 06-20-2010 07:54 AM
Samba System Policy, Default User Policy scooter549 Linux - General 2 02-24-2009 02:23 AM
HAL device policy troubleshooting chochem Linux - General 0 10-10-2008 09:41 AM
default gateway device selection sharadshankar Linux - Networking 1 06-14-2006 10:00 AM
Wireless device selection jmadsen Linux - Wireless Networking 1 05-16-2003 11:59 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 07:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration