LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices



Reply
 
Search this Thread
Old 12-01-2012, 07:42 AM   #1
hiroyuki
LQ Newbie
 
Registered: Dec 2012
Posts: 5

Rep: Reputation: Disabled
one 16K random read I/O issues 2 scsi I/O (16K and 4K) requests


I noticed weird issue when benchmarking random read I/O for files in
linux.
(Linux 2.6.18-274, files on ext3 FS).
The Benchmarking program is my own program and it simply keeps reading
16KB of a file from a random offset.

I traced I/O behavior at system call level and scsi level with systemtap and
I noticed that one 16KB pread issues 2 scsi I/Os as following.

=============================================
SYSPREAD random(8472) 3, 0x16fc5200, 16384, 128137183232
SCSI random(8472) 0 1 0 0 start-sector: 226321183 size: 4096 bufflen
4096 FROM_DEVICE 1354354008068009
SCSI random(8472) 0 1 0 0 start-sector: 226323431 size: 16384 bufflen
16384 FROM_DEVICE 1354354008075927
SYSPREAD random(8472) 3, 0x16fc5200, 16384, 21807710208
SCSI random(8472) 0 1 0 0 start-sector: 1889888935 size: 4096 bufflen
4096 FROM_DEVICE 1354354008085128
SCSI random(8472) 0 1 0 0 start-sector: 1889891823 size: 16384 bufflen
16384 FROM_DEVICE 1354354008097161
SYSPREAD random(8472) 3, 0x16fc5200, 16384, 139365318656
SCSI random(8472) 0 1 0 0 start-sector: 254092663 size: 4096 bufflen
4096 FROM_DEVICE 1354354008100633
SCSI random(8472) 0 1 0 0 start-sector: 254094879 size: 16384 bufflen
16384 FROM_DEVICE 1354354008111723
SYSPREAD random(8472) 3, 0x16fc5200, 16384, 60304424960
SCSI random(8472) 0 1 0 0 start-sector: 58119807 size: 4096 bufflen
4096 FROM_DEVICE 1354354008120469
SCSI random(8472) 0 1 0 0 start-sector: 58125415 size: 16384 bufflen
16384 FROM_DEVICE 1354354008126343
=============================================


As shown above, one 16KB pread issues 2 scsi I/Os. (I traced scsi io
dispatching with probe scsi.iodispatching)

One scsi I/O is 16KB I/O as requested from the application and it's OK.
The thing is the other 4KB I/O which I don't know why linux issues that I/O.

Of course, I/O performance is degraded by the weired 4KB I/O and I am
having trouble.
I also use fio (famous I/O benchmark tool) and noticed the same issue,
so it's not from the application.
Does anybody know what is going on ?
Any comments or advices are appreciated.

Thanks
 
Old 12-01-2012, 04:19 PM   #2
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,539

Rep: Reputation: 149Reputation: 149
Do you have timing of those accesses? Just a wild guess, but may the other read be a read from the filesystem structure to find where to find your random chunk?
 
Old 12-01-2012, 08:58 PM   #3
hiroyuki
LQ Newbie
 
Registered: Dec 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Thank you for the comment.

>Do you have timing of those accesses? Just a wild guess, but may the other read be a read from the filesystem structure to find where to find your random chunk?

At the application level, no.
This issue happens with even "cat" program.
"cat" a small file (less than 4K), comes with 4K I/O and the other 4K I/O which I don't know what it is.
 
Old 12-02-2012, 04:25 AM   #4
hiroyuki
LQ Newbie
 
Registered: Dec 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
I figured out what is going on, but I don't know what it is for.

Ext3 filesystem has some 4KB data in each 4096KB(8192 sectors) data.
Visually, data is aligned like the following.

|4KB|4096KB|4KB|4096KB|4KB|4096KB| ...

And 4096KB area in only accessible by application programs.
When accessing the first 4096KB area for the first time,
then OS reads the 4KB just before the 4096KB area first
and then read the requested data in the 4096KB area.

When accessing a large file (compared to the DRAM size) randomly,
every I/O has rare chance of hitting page cahce,
so every I/O request comes together with 4KB I/O.

The thing is what the 4KB data is for ?
Is this location metadata for filesystem ?
Is there any way I can remove this ?
Or Is there any way I can clear the 4096KB area only ?

Any comments and advices are appreciated.

(I tested in many machines with many kernel versions. this happens in
all machines.)

Thanks.
 
Old 12-02-2012, 08:52 AM   #5
hiroyuki
LQ Newbie
 
Registered: Dec 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
I figured it out. It's from ext3 indirect block mapping. (Ext3 has a block which has block pointers in every 1024 blocks.)
I changed the filesystem to ext4 makes the issue disappear. (Ext4 has more efficient scheme for block addressing.)

Thank you all.
 
Old 12-07-2012, 09:38 AM   #6
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,455

Rep: Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172
I would hardly call this a bona fide "issue."

Over time, the various caches will do more-or-less good. What you should strive to do is to arrange the file access request pattern to improve the chances of the next piece of data already being cached somewhere. For example, sorting the locations in ascending order and issuing requests that way.

Last edited by sundialsvcs; 12-07-2012 at 09:40 AM.
 
Old 12-07-2012, 08:25 PM   #7
hiroyuki
LQ Newbie
 
Registered: Dec 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
It is a "issue" for large file, for example TB of data.
It is a really bad design for those large file, that is why ext4 extent is introduced.

Of course, we should care about locality, but it is a another thing and it has nothing to do with filesystem's bad design.
And also, we can't always sort data for accesses.
For example in database, we can't always have clustered index for all attributes, we have to have some secondary indexes for some attributes,
and accesses with secondary index are ways random accesses.
 
Old 12-09-2012, 02:11 PM   #8
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,455

Rep: Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172
An interesting idea, although here's more about why my thoughts are still, "no, it really doesn't." (And we can probably just let it rest at that.)

No matter how big the file is, you're going to hit one-or-more index blocks followed by access to the data itself. Admittedly, file systems are geared toward "millions of small files" and "enormous single files" are certainly uncommon. But you can still reach anywhere in the file by making one or two index-block reads (and this only to the extent that they're not cached) followed by the data that you want.

I suggest that, with "large files," the root problem might well be that you're making a lot of accesses to it. Indeed, if those accesses are pure-random, you might be hitting several index-blocks per access instead of one. But once again what you'd really like to find a way to do is to access the data in somewhat of a less-than-random fashion. Make those buffer caches work for you.

Yes, ext4 does make a nod to the "gigantic file" case, and certainly one reason why they did this was to accommodate humongous (e.g. MySQL) databases that live in the filesystem.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
vmalloc or kmalloc for 16k memory space lorizhuang Linux - Kernel 3 06-14-2011 03:26 PM
Where to find a 16k stack kernel for FC5 dennern Linux - Kernel 1 08-24-2006 04:22 AM
Help installing 16K stack. ishcoleobo Linux - Software 3 07-28-2006 03:40 PM
upgrading kernel stack from 4k to 16K wahaha Linux - Newbie 6 07-16-2006 11:52 AM
16K stack size in 2.6..xx quietguy47 Linux - Kernel 3 04-24-2006 02:21 PM


All times are GMT -5. The time now is 05:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration