LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 11-26-2010, 03:12 PM   #1
dmonty
LQ Newbie
 
Registered: Dec 2004
Posts: 20

Rep: Reputation: 2
Source of heavy disk io?


Background:
* high school environment.
* 217 diskless clients (root) / run off server.
* server running raid10 with 4 disks.
* every once in a while atop and top show high IOWAIT on the disks.
* most of the time everything is fast/snappy.

Question:
* Is there a way to tell/track which files are causing the IO wait?
** I know that at times of the day there is high IO wait.
** I know that it is NFS causing it. But I don't know what the nfs thread is reading/writing to create the disk IO - kde cache file? firefox flash applet? /usr/games/some_game?
 
Old 11-26-2010, 03:25 PM   #2
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
Stupid question: is atop similar to iotop? (just installed atop but have some problems understanding its output)
 
Old 11-26-2010, 04:14 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
%wa isn't a direct indicator of tasks waiting for I/O to complete. Very poor name, and an almost useless metric. It indicates that (all) the CPUs are idle, and there is uncompleted I/O. Might be just one I/O, might be a write out that no-one cares about. No way to tell.
What's you loadavg at the same time - that might (MIGHT) be a better indicator, but for "peaky" values, you might not even see it in the numbers. You might want to look at installing collectl.

Hardware or software RAID ?. Separate controllers ?. Are you swapping ?. Is it impacting your users ?.

I doubt there is an easy way to identify file access - iotop indicates the major users, but you already know that.
 
Old 11-26-2010, 04:35 PM   #4
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
syg00 is right - I think especially concerning "It indicates that (all) the CPUs are idle, and there is uncompleted I/O" and "Is it impacting your users ?".
I used to work with an IBM AIX server and had often high wait times with the CPU and at the beginning I was jumping around trying to lower that value but in most of the cases ("most" - not "all") it was normal and had practically no repercussions on the users.
 
Old 11-26-2010, 05:04 PM   #5
dmonty
LQ Newbie
 
Registered: Dec 2004
Posts: 20

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by syg00 View Post
I doubt there is an easy way to identify file access - iotop indicates the major users, but you already know that.
This is what I figured it is not possible to track IO on a per file basis. I've been comparing our school's diskless servers in MRTG and and it seems when we get to around 200 diskless workstations the hardware raid can not keep up. So I'll be focusing in on faster drives/raid.

In regards to atop vs iotop - atop is much nicer break-down of what is causing bottle necks (CPU, disk, net, swap). The MRTG graphs are nice for a visual comparison over time.

Thanks for your help.
 
1 members found this post helpful.
Old 11-26-2010, 05:08 PM   #6
dmonty
LQ Newbie
 
Registered: Dec 2004
Posts: 20

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by Pearlseattle View Post
"Is it impacting your users ?"
Yes - for some reason when the IO is too high DNS stops resolving in a timely manner which prompts students to reboot their diskless client in hopes of fixing Internet. Rebooting diskless clients creates more disk IO.
 
Old 11-26-2010, 07:24 PM   #7
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 4,070

Rep: Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897
Quote:
Originally Posted by dmonty View Post
Yes - for some reason when the IO is too high DNS stops resolving in a timely manner...
This is probably a dumb question, but are you sure that it is this way around? That is, are you sure that it isn't DNS resolve failures causing high IO waits?
 
Old 11-27-2010, 12:58 AM   #8
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Hi -

Quote:
This is probably a dumb question, but are you sure that it is this way around? That is, are you sure that it isn't DNS resolve failures causing high IO waits?
Good question

Assuming that most (all?) of your most frequently used NFS servers have static IP's, maybe you might want to propagate an /etc/hosts containing these addresses to your clients (along with an nsswitch.conf that specifies "use /etc/hosts before querying DNS).
 
Old 12-03-2010, 10:19 AM   #9
dmonty
LQ Newbie
 
Registered: Dec 2004
Posts: 20

Original Poster
Rep: Reputation: 2
Resolved

Resolved by ensuring that all of the following folders are stored in local tmpfs on the client and not on the server:...
/tmp /var/cache/man /var/lib/xkb /var/lock /var/run /var/log /var/spool /var/lib/discover /etc/hotplug/.run /var/lib/nfs /var/lib/gdm /var/lib/xdm /var/lib/cups /var/lib/urandom /var/log /var/yp/binding /var/cache/cups /etc/network/run /media /var/lib/preload

A few of these I had moved to the server because software was running the workstation out of RAM. I'll deal with tmpfs-abusive software on a per-application basis.
 
1 members found this post helpful.
Old 12-14-2010, 03:17 PM   #10
StoatWblr
LQ Newbie
 
Registered: Jul 2004
Distribution: Suse Slackware rhel4-8 Ubuntu (various)
Posts: 5

Rep: Reputation: 1
Switch to TCP NFS.

If you have Gb networking throughout, raise MTU to 9000. Tune your NFS server and clients to use maximum packet sizes.

Ideally, find an alternative to NFS - it's buggy and crufty as hell.

Treat any NFS exported filesystem as _prohibited_ for any other kind of export or for local filesystem work, or you risk data corruption (in particular it's extremely dangerous to export the same filesystem NFS and SAMBA at the same time, NFS export to another box and SAMBA export from there)

This is because NFS doesn't pass client lock requests back down to the filesystem, so it's trivially possible for NFS and any other processes to try and write to the same file at the same time. Simultaneous NFS clients are handled within NFS server, so that's fairly safe.

As you've discovered, it's logging, locking, tmpfs and home directory X stuff (firefox cache is a particularly nasty culprit) which can cause a LOT of server IO on a diskless client. Consider throwing more ram at the clients or toss in a cheap small disk and setup cachefilesd to help take some of the load off the server.

Finally: On the server itself, figure about 1Gb ram per Tb of disk for optimal caching and look at tuning your swappiness/memory pressure settings (you don't want the server to swap or things get _really_ slow, nor do you want it to cache writes and then dump them all out at once as this will lead to pauses in fileserving.)
 
Old 12-14-2010, 03:25 PM   #11
dmonty
LQ Newbie
 
Registered: Dec 2004
Posts: 20

Original Poster
Rep: Reputation: 2
Yes we also turned off Firefox caching and session storage - as you mentioned Firefox hammers the disk and overall performance improves with the following disabled...

lockPref("browser.cache.disk.enable", false);
lockPref("browser.cache.disk.capacity", 0);
lockPref("browser.sessionstore.resume_from_crash", false);

We will soon be moving to a central squid cache for the entire district.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
System Very Unresponsive Under Heavy Disk I/O... bthornton Linux - Server 5 01-16-2010 03:32 PM
System freeze on large file/heavy disk jshellman Linux - Hardware 1 11-15-2006 03:47 PM
Heavy disk access slows machine p-static Linux - Hardware 4 08-28-2004 02:18 PM
HELP! root pw lost and heavy disk problems madppiper Linux - Newbie 4 04-13-2004 08:59 AM
Mandrake 9.2 – Heavy ext3 disk checking? IamI Mandriva 3 04-03-2004 04:11 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 01:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration