LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices

Reply
 
Search this Thread
Old 01-15-2012, 02:06 PM   #1
javor_nik
LQ Newbie
 
Registered: Jan 2012
Posts: 3

Rep: Reputation: Disabled
Unhappy I/O size fragmentation


Hi,

I'm trying to figure out why Linux is in general splitting I/O requests to smaller chunks.

Here is my test:
Code:
# time dd if=/dev/sdak bs=1024k of=/dev/null iflag=direct
At the same time:
Code:
$ iostat -mx dm-10 10
...
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.97    0.00    1.79   12.21    0.00   74.03

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdak              0.00     0.00 800.00  0.00   200.00     0.00   512.00     1.77    2.22   0.60  47.86

* I.e. Ė as per iostat the 1024KB operations are being split into 512-sector chunks (i.e. into 256KB). Same happens with 512KB requests too.
* 256KB and smaller donít seem to be split in that way.
* 8MB -> split to 256KB
* 20000MB -> split into ~500KB avg request size. (So maybe it can do more than 256K sometimes...)

* Same observations (I/O is split into smaller requests) with other tools (Oracle orion - i.e. async IO results in same splitting; sar show same results as iostat in terms of request size; confirmed by blktrace too.

That has been tested on Oracle Linux 5.7, RHEL 5.5, RHEL 5.7, Ubuntu 11.10. Sometimes the sizes are different, I tried multipathing devices, local SATA drives, virtual machine disks, mdadm RAID0 (chunk size 1MB, 2MB, 4MB): sometimes the size was a bit different (i.e. 128K instead of 256K) but in all the the 1024K requests were split into smaller chunks.

So my questions are:
Q1. Why does that happen? What does that depend on? (Is it related to hardware, to host bus adapter or it's driver, or is more a Linux-kernel specific behavior)?
Q2. Can this be changed? Can we make it sending full-size 1MB requests to storage?

Here is an article which I think is touching some points probably related to my issue: http://people.redhat.com/msnitzer/docs/io-limits.txt
Still it's not clear to me how can I increase the I/O size and avoid it being fragmented, where are these limits coming from and how can we change them.
 
Old 01-16-2012, 12:13 PM   #2
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,423

Rep: Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158
I/O requests are ultimately broken down to the size of a disk sector or block. At the prerogative of the device driver, some requests can be chained across multiple blocks. But you will not get a clear picture of what is happening from any command line tool, and you ought not have to be concerned with it.

I/O requests are first made to the file buffer pool (so that multiple changes to a single point in the file can be lumped into just one disk write. The writes themselves are, to a great extent, "lazy."
 
Old 01-16-2012, 12:54 PM   #3
javor_nik
LQ Newbie
 
Registered: Jan 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Well, what I'm concerned is that IO's requests are broken to smaller pieces before reaching the relevant HDD. A decent HDD should have a larger buffer and be able to read/write much more data at once than a single sector. (Writing data into very small pieces /like 512B sector size/ usually consumes more CPU and is much more slower - at least that's what happens if I read/write data using very small buffer compared to larger one).
/There is similar thing in Network world too - and jumbo frames are being used there to improve performance/.

Actually - the main issue is related to SAN storages (though I've been examining that case with local disks too). So data is going out of/into the system trough multipathing, fiber channel - SAN cache - RAID controllers, etc. The smaller number of pieces - the better when going through all these layers (when the amount of data being written is large enough).

I'm concerned because the application is I/O intensive, storage is getting overloaded; size of I/O operations from application point of view is very well aligned with OS I/O size. OS buffers are usually bypassed (asynchronous direct IO is being used).

The application which is of major interest here is Oracle database (using Oracle ASM) though I think the question is quite general.

Here is one more link probably related to the topic:
http://docs.redhat.com/docs/en-US/Re...orage-iolimits

Still it's not clear to me what can I do if I want to optimize the system for 1MB I/O size (assuming we can make storage stripe depth 1MB too).
 
Old 01-16-2012, 11:04 PM   #4
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,423

Rep: Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158
So far, if I may say, it seems that you are looking upon this as a theoretical problem. There's not much use in doing that, I think. Go find something that actually hurts, find out why it's hurting, and experiment with ways to stop the pain. If "the question is quite general," you can never solve it. What is the specific pain that your system is experiencing?
 
Old 01-17-2012, 05:01 AM   #5
javor_nik
LQ Newbie
 
Registered: Jan 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Well, at the end there are real problems too. We have an Oracle database(s) which are suffering from bad performance (mainly due to slow I/O) - with SAN storage in place.

In general Oracle and others are recommending it's best to align application I/O size with storage I/O chunk size. That's what I was trying to try - but I'm not sure if I can make a clean test if Linux is always chunking requests to smaller pieces.
On other systems (e.g. using AIX) - increasing I/O size from OS level used to help in similar scenarios.

Another issue is with correlating I/O performance statistics and requirements from the database vs storage: storage people are reporting IOPS but it makes a big difference what's the request size (6000 IOPS of 1M are not the same as 6000 IOPS of 256K).

If you experiment with the storage calculator at http://www.wmarow.com/strcalc/goals.html for example: with a storage with same number of physical disks a significantly higher throughput could be achieved with larger I/O request size compared to smaller one.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
De-fragmentation in Linux pprs project Linux - Newbie 17 05-08-2008 01:43 PM
De-fragmentation in Linux pprs project Linux - Software 9 02-02-2008 03:01 PM
Getting the Fragmentation Information Morpheus Linux - General 3 05-01-2007 12:55 AM
Harddisk fragmentation tiredoflogins Linux - Hardware 3 10-20-2004 04:17 AM
turning on ip fragmentation with 2.6-8 (fc2) Montaignep Linux - Networking 0 09-23-2004 10:32 PM


All times are GMT -5. The time now is 09:42 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration