[SOLVED] Source of heavy disk io?

dmonty · 11-26-2010, 03:12 PM

Background:
* high school environment.
* 217 diskless clients (root) / run off server.
* server running raid10 with 4 disks.
* every once in a while atop and top show high IOWAIT on the disks.
* most of the time everything is fast/snappy.

Question:
* Is there a way to tell/track which files are causing the IO wait?
** I know that at times of the day there is high IO wait.
** I know that it is NFS causing it. But I don't know what the nfs thread is reading/writing to create the disk IO - kde cache file? firefox flash applet? /usr/games/some_game?

Pearlseattle · 11-26-2010, 03:25 PM

Stupid question: is atop similar to iotop? (just installed atop but have some problems understanding its output)

syg00 · 11-26-2010, 04:14 PM

%wa isn't a direct indicator of tasks waiting for I/O to complete. Very poor name, and an almost useless metric. It indicates that (all) the CPUs are idle, and there is uncompleted I/O. Might be just one I/O, might be a write out that no-one cares about. No way to tell.
What's you loadavg at the same time - that might (MIGHT) be a better indicator, but for "peaky" values, you might not even see it in the numbers. You might want to look at installing collectl.

Hardware or software RAID ?. Separate controllers ?. Are you swapping ?. Is it impacting your users ?.

I doubt there is an easy way to identify file access - iotop indicates the major users, but you already know that.

Pearlseattle · 11-26-2010, 04:35 PM

syg00 is right - I think especially concerning "It indicates that (all) the CPUs are idle, and there is uncompleted I/O" and "Is it impacting your users ?".
I used to work with an IBM AIX server and had often high wait times with the CPU and at the beginning I was jumping around trying to lower that value but in most of the cases ("most" - not "all") it was normal and had practically no repercussions on the users.

dmonty · 11-26-2010, 05:04 PM

Quote:

Originally Posted by syg00

I doubt there is an easy way to identify file access - iotop indicates the major users, but you already know that.

This is what I figured it is not possible to track IO on a per file basis. I've been comparing our school's diskless servers in MRTG and and it seems when we get to around 200 diskless workstations the hardware raid can not keep up. So I'll be focusing in on faster drives/raid.

In regards to atop vs iotop - atop is much nicer break-down of what is causing bottle necks (CPU, disk, net, swap). The MRTG graphs are nice for a visual comparison over time.

Thanks for your help.

dmonty · 11-26-2010, 05:08 PM

Quote:

Originally Posted by Pearlseattle

"Is it impacting your users ?"

Yes - for some reason when the IO is too high DNS stops resolving in a timely manner which prompts students to reboot their diskless client in hopes of fixing Internet. Rebooting diskless clients creates more disk IO.

salasi · 11-26-2010, 07:24 PM

Quote:

Originally Posted by dmonty

Yes - for some reason when the IO is too high DNS stops resolving in a timely manner...

This is probably a dumb question, but are you sure that it is this way around? That is, are you sure that it isn't DNS resolve failures causing high IO waits?

paulsm4 · 11-27-2010, 12:58 AM

Hi -

Quote:

This is probably a dumb question, but are you sure that it is this way around? That is, are you sure that it isn't DNS resolve failures causing high IO waits?

Good question

Assuming that most (all?) of your most frequently used NFS servers have static IP's, maybe you might want to propagate an /etc/hosts containing these addresses to your clients (along with an nsswitch.conf that specifies "use /etc/hosts before querying DNS).

dmonty · 12-03-2010, 10:19 AM

Resolved by ensuring that all of the following folders are stored in local tmpfs on the client and not on the server:...
/tmp /var/cache/man /var/lib/xkb /var/lock /var/run /var/log /var/spool /var/lib/discover /etc/hotplug/.run /var/lib/nfs /var/lib/gdm /var/lib/xdm /var/lib/cups /var/lib/urandom /var/log /var/yp/binding /var/cache/cups /etc/network/run /media /var/lib/preload

A few of these I had moved to the server because software was running the workstation out of RAM. I'll deal with tmpfs-abusive software on a per-application basis.

StoatWblr · 12-14-2010, 03:17 PM

Switch to TCP NFS.

If you have Gb networking throughout, raise MTU to 9000. Tune your NFS server and clients to use maximum packet sizes.

Ideally, find an alternative to NFS - it's buggy and crufty as hell.

Treat any NFS exported filesystem as _prohibited_ for any other kind of export or for local filesystem work, or you risk data corruption (in particular it's extremely dangerous to export the same filesystem NFS and SAMBA at the same time, NFS export to another box and SAMBA export from there)

This is because NFS doesn't pass client lock requests back down to the filesystem, so it's trivially possible for NFS and any other processes to try and write to the same file at the same time. Simultaneous NFS clients are handled within NFS server, so that's fairly safe.

As you've discovered, it's logging, locking, tmpfs and home directory X stuff (firefox cache is a particularly nasty culprit) which can cause a LOT of server IO on a diskless client. Consider throwing more ram at the clients or toss in a cheap small disk and setup cachefilesd to help take some of the load off the server.

Finally: On the server itself, figure about 1Gb ram per Tb of disk for optimal caching and look at tuning your swappiness/memory pressure settings (you don't want the server to swap or things get _really_ slow, nor do you want it to cache writes and then dump them all out at once as this will lead to pauses in fileserving.)

dmonty · 12-14-2010, 03:25 PM

Yes we also turned off Firefox caching and session storage - as you mentioned Firefox hammers the disk and overall performance improves with the following disabled...

lockPref("browser.cache.disk.enable", false);
lockPref("browser.cache.disk.capacity", 0);
lockPref("browser.sessionstore.resume_from_crash", false);

We will soon be moving to a central squid cache for the entire district.