LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Why is JFS faster than a raw device for 1.5TB disk? (http://www.linuxquestions.org/questions/linux-general-1/why-is-jfs-faster-than-a-raw-device-for-1-5tb-disk-827339/)

Daemo 08-19-2010 09:11 PM

Why is JFS faster than a raw device for 1.5TB disk?
 
I have a 4 * 1.5TB RAID5 disk array (software linux RAID, formatted with jfs) on my Fedora 12 system and want to expand it by adding another 1.5TB disk.

I have added a drive to the system and conducted a simple performance check on it to make sure it was functioning properly:

Code:

# dd if=/tmp/bigfile.dat of=/dev/sdg1
5478774+1 records in
5478774+1 records out
2805132609 bytes (2.8 GB) copied, 168.77 s, 16.6 MB/s

But 16.6 MB/s is lousy. I ran an iostat -dmx 2 on this drive at the time of this lousy performance, and typical output was:
Code:

Device:        rrqm/s  wrqm/s    r/s    w/s    rMB/s    wMB/s avgrq-sz avgqu-sz  await  svctm  %util
sda              0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
sdb              1.50    0.00  258.50    0.00    16.25    0.00  128.74    0.17    0.66  0.36  9.30
sdg              0.00  3556.00 4160.00  32.00    16.25    16.00    15.76  135.73  35.11  0.24 100.05

(note that sda and sdb are a linux raid mirror set for the / filesystem that holds /tmp). I formatted the new drive (/dev/sdg1) with jfs and mounted it under /mnt2:
Code:

# jfs_mkfs /dev/sdg1
jfs_mkfs version 1.1.13, 17-Jul-2008
Warning!  All data on device /dev/sdg1 will be lost!

Continue? (Y/N) y
  \

Format completed successfully.

1465138552 kilobytes total disk space.
# mount /dev/sdg1 /mnt2

and ran a similar test, this time to the filesystem:
Code:

# dd if=/tmp/bigfile.dat  of=/mnt2/bigfile.dat
5478774+1 records in
5478774+1 records out
2805132609 bytes (2.8 GB) copied, 25.6558 s, 109 MB/s

109 MB/s is awesome. An iostat -dmx 2 typically looked like this during this better performance:
Code:

Device:        rrqm/s  wrqm/s    r/s    w/s    rMB/s    wMB/s avgrq-sz avgqu-sz  await  svctm  %util
sda              9.00    0.00 1777.00    0.00  111.62    0.00  128.65    1.43    0.78  0.39  69.75
sdb              0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00  0.00
sdg              0.00  128.50    3.50  230.50    0.01  109.26  956.36    96.04  394.79  4.27 100.00

My question is this: if I add this new disk to the exisitng 4-disk RAID5 array, will it perform badly (around the 16.6 MB/s mark) or will it perform better (closer to the 109 MB/s mark)?

I would like to know what the performance will be like before I add the disk to the array because I don't want to wait for the whole array to be rebuilt before finding out my array is performing badly. The array is used as part of a mythtv system and has up to 6 simultaneous recordings running on it, so it needs to perform well.

I'm confused!

Thanks in advance for any help!

-Daemo

syg00 08-19-2010 09:44 PM

Never believe numbers from a rerun unless you can absolutely eliminate cache effects.
echo "3" > /proc/sys/vm/drop_caches
Will generally give you some (better) idea of the numbers. Personally I reboot before every run.

Daemo 08-19-2010 10:05 PM

Re-runs
 
Thanks syg00

I re-ran the tests after flushing the caches each time as you recommended. This time I got 17 MB/s to the raw device partition, and 42 MB/s to the filesystem. Less severe but still raises the question.

syg00 08-19-2010 11:30 PM

Give the first run a blocksize - 4096 would be a reasonable start. Increase as appropriate.

Update: that would be "obs" - you want to see the effect on the target disk. The filesystem will handle things appropriately on the input.

Daemo 08-20-2010 01:25 AM

Thanks very much syg00

Okay, now I have some consistency. After using dd with a 4096 byte blocksize, both the /dev/sdg1 raw partition and cooked jfs filesystem were similar: about 92-96 MB/s. I ran both of these tests several times, clearing the cache between tests.

Here's how it went:
bs=512... 16 MB/s
bs=1024... 17 MB/s
bs=2048... 19 MB/s
bs=4096... 94 MB/s
bs=8192... 99 MB/s
bs=16384... 98 MB/s
bs=32768... 99 MB/s

Note the jump at 4K blocks. My theory is that as this disk has native 4k sector sizes (and partitioning starts as sector 64) then this block size or multiples of this block size is most congruent with reads and writes.

Am I correct in saying: the raid array uses a 32K chunk size, so the md driver should read and write in 32K chunks and therefore this disk should perform optimally?

syg00 08-20-2010 02:37 AM

Optimally ?. I would never go that far - too much software in between. The aims of the (various) authors is unlikely to directly correspond to what an end-user might want.

I would expect 32k is a reasonable read size - strace should confirm what the filesystem is trying to do at least. Getting that close to the block device layers (md and the real device layer below VFS) is probably way too difficult for any buy-back you might get.

Daemo 08-21-2010 10:16 PM

Completed!
 
Update: I have added the disk to the array, and the reshape rate was around the 20 MB/s, which is about right (each chunk remap is 4 reads and 5 writes). Reshape started at about midnight friday morning and completed sometime this morning (sunday). While mythtv was recording, the reshape rate went down to about 1 - 1.5 MB/s as is usual.

After the array was reshaped and I expanded the jfs filesystem, I was getting about 90-99 MB/s write rates from my clear cache/dd test, which is excactly what I need.

Thanks syg00!


All times are GMT -5. The time now is 11:53 PM.