[SOLVED] Uneven spread of throughput for similar processes

jklaverstijn · 08-14-2014, 07:43 AM

Hi all,

We are in the process of configuring and tuning a beefy RHEL 6.4 server for datawarehousing purposes. The nature of this makes I/O a critical performance factor. Now we are doing some crude testing using multiple parallel dd executions to get a feel for the scalability and raw performance of the I/O to a 24TB HDS SAN box. We see a strange effect that we cannot explain. We're hoping someone can give that explanation.

If we spawn 8 or 9 parallel dd executions (dd if=/dev/zero of=somefile$$ bs=64k count=50000 &) we see an uneven spread of throughput and elapsed times. Often but not always the first execution gets much more attention and reports 800+MB/s whereas the others only get 70 MB/s. Similar things happen when reading or doing a mixed I/O pattern.

We have tuned with high-performance and use the deadline scheduler.

Release is 2.6.32-358.23.2.el6.86_64, #cpu's=80, memory 256GB

What could explain this effect?

Thanks in advance for any input

jailbait · 08-14-2014, 09:52 PM

The difference in throughput and elapsed times that you see is an effect of the I/O buffer pool. The reported times are much faster when an write request is loaded into a pool buffer and completion is immediately signaled back to the process requesting the I/O. Once the buffer pool is filled then a write request has to wait until some other I/O request finishes and frees up a buffer in the buffer pool.

You also get time differences on reads depending on whether a read is completed immediately because the record is already sitting in the buffer pool or if the record has to be read into the buffer pool from the hard drive.

That said, the first dd execution that you start probably gets a lot of I/O requests into the buffer pool before the other dd executions get going.

---------------------
Steve Stites

syg00 · 08-14-2014, 11:16 PM

Could be - this might be mitigated by specifying "direct" on the dd commands.
Only affects the page-cache though - I'd also test the noop scheduler; deadline still re-orders (and delays) I/Os. But this only attempts to limit the variability at the host - who knows what the SAN may be doing at that end. You would expect the SAN to respond immediately (unless you do write-through caching), so hopefully it is consistenly fast in reponse time.

jailbait · 08-15-2014, 10:01 AM

Quote:

Originally Posted by syg00

Could be - this might be mitigated by specifying "direct" on the dd commands.

The purpose of the buffer pool cache is to get the fastest possible disk I/O in aggregate. If you specify "direct" you will probably get all of the dd commands to run at the same speed. However the time it takes for the entire group of dd commands to finish will increase because you will lose any efficiencies created by the cache.

-----------------------
Steve Stites

jklaverstijn · 08-15-2014, 10:41 AM

Thanks for the pointers. Indeed when we use the 'direct' flag the effect disappears and all processes show consistent numbers. Bypassing the cache is not our intention for a real life application. For now we were looking for an explanation of what we saw during testing. And sure enough we have it thanks to you. Many thanks for your input jailbait and syg00.