LinuxQuestions.org - System Very Unresponsive Under Heavy Disk I/O...

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - System Very Unresponsive Under Heavy Disk I/O... (https://www.linuxquestions.org/questions/linux-server-73/system-very-unresponsive-under-heavy-disk-i-o-782633/)

System Very Unresponsive Under Heavy Disk I/O...

Hey all,

Got a semi-theoretical "why is it that...?" question here.

First off, let me give some background on my setup: I'm running Ubuntu 9.04 (64-bit) with Compiz enabled. My example system has a quad-core (Intel Q9300) CPU with 4GB of RAM and two large enterprise-grade SATA drives. No hardware or software RAID at all.

So here's an example scenario: I recently was archiving a virtual machine image with 'tar'--it was about an 8GB volume and I wasn't even gzipping it. I noticed while tar was running that the other windows I had open would randomly fade to grey (an indication that they had become unresponsive). I checked System Monitor while tar was still running and found that the load on all of CPU cores averaged below 25% and that I had not even exhausted half of my memory. This is to be expected since I wasn't compressing the data (i.e. the CPU doesn't need to transform data).

The general "problem" is that, whenever there is a disk I/O intensive task running, my Linux boxen become very unresponsive. I've tried variations of this scenario on different hardware with similar results. Whether it's just running 'dd if=/dev/zero of=hugefile bs=1M count=5000' or copying a large file with 'cp' or running 'tar'... these are all scenarios which induce heavy disk I/O with minimal CPU/memory overhead. Why, then, do foreground tasks become so unresponsive?

I realize that foreground applications read/write to disk for any number of different things (even opening a chat with Pidgin will hit the disk if logging is enabled). My guess is that these (mostly small) I/O requests are what causes the foreground applications to become unresponsive: they're probably waiting on disk I/O. It's just surprising to me that I am witnessing 8+ second delays for very small read/writes on a modern system. It seems like the kernel could schedule things a little more intelligently to allow these requests through more quickly.

So, is this kind of slowdown the result of the way that the kernel schedules things by default? Or is this more of a hardware issue (i.e. something related to my SATA controller, etc.)?

For what it's worth, I've tried similar scenarios on Windows machines and they don't seem to become as unresponsive. But "seem" here is not an objective measure, so don't quote me on that.

Any input/thoughts/advice is appreciated!

The problem is that the system doesn't know what's important to you. If precedence is given to the foreground, people complain that their background copies take too long. Give precedence to the background, and the foreground stops. Treat them equally and, as you've seen, large I/O can temporarily block foreground applications.

There are different ways of dealing with this. For occasional background processes, you can use ionice. For example:

Code:

ionice -c3 tar cvf bigfile.tar bigdirectory/

The '-c3' tells the system to run the I/O as idle (when nothing else is using I/O). This is handy on cron jobs, for example.

You can also change your I/O scheduler if this is a normal activity for you. The scheduler can be set per drive, so for /dev/sda you can see the current scheduler with:

Code:

cat /sys/block/sda/queue/scheduler

noop anticipatory deadline [cfq]

The selected scheduler is in brackets. For example, you might do better with the deadline scheduler (as root):

Code:

echo deadline > /sys/block/sda/queue/scheduler

If that works better for you, you can set it in /etc/rc.local.

More information here.

Cool.

I have played around with the different schedulers on occasion too, and found that 'anticipatory' usually fits my general needs. But a real world situation is always good to learn from.

I'll be interested to know if the OP tries changing the scheduler, and what if any effect it has on the IO operations causing the problem.

NOTE: The scheduler can also be adjusted via the sysctl interface, where it is referred to as the "elevator"

Sasha

These are generalities for workloads:

noop: (round robin) this is probably not the best for either highly interact or heavy workloads, but it's very predictable in response time. It's probably a good choice for lightly I/O loaded systems using real-time CPU scheduling.

anticipatory: Probably best for interactive users, at the expense of heavy I/O throughput.

deadline: Best for heavy I/O workloads.

cfq: Tries to be all things to all people (the default). It expects that you manually manage I/O workloads with ionice.

Personally, I have a quad CPU and 4 SATA-2 drives, and do a lot of heavy I/O. I find that either deadline or ionice'd processes do well in my environment, while keeping various GUI applications from getting starved. Every machine has a slightly different profile, so no single configuration is best. It's easy to switch on the fly, so there's no reason not to experiment. :)

@ macemoneta,

thanks for the blurb on 'noop' -- I always have wondered what a 'noop' gadget of any sort, could/would possibly do, and particularly wondered about the noop scheduler; 'noop' is not a really informative name for anything that actually does something :) maybe they should have called it 'leastop' or 'notmuchop'...

Fantastic responses, macemoneta! I knew I would need to do some tuning to throttle system resources to where I wanted them most. I'm just glad to hear that such "tuning" will not be something that requires a kernel recompile!

I'll play around with ionice and the different schedulers to see what works best in these situations.

Thanks again.