LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 04-20-2013, 05:34 PM   #1
TiredOfThis
Member
 
Registered: Oct 2007
Posts: 53

Rep: Reputation: 15
I/O timeout using NFS fileserver


I have a fileserver exporting a raid5 over NFSv4.

Recently I've started running into problems where I/O will timeout. I'll do a copy (either through the GUI or the CLI) and it will just hang for a while. dmesg output will show server timeouts (not responding).

If I run htop on the server it shows the memory filling up with buffers until it hits max, then the drives start churning like mad (meanwhile the I/O stalls on the client). No more progress gets made (and I/O times out) until the retry limit is hit on the client and the operation fails.

This is definitely an nfs issue. I can do a local copy to the raid volume or copy the same files via ftp and everything goes fine. dmesg and system journal on fileserver is clean.

I realize I can mitigate the problem by adding more RAM to the server, but I'd rather figure out what's broken.

I've done some Googling and tweaking things like number of nfs server threads (not helpful) or export and mount options (also seems not helpful). The volume is exported 'sync', so I would expect all I/O to be flushed (making it weird that I see this "memory fills up and then begins to flush" behavior).

Any ideas?
 
Old 04-20-2013, 08:53 PM   #2
lleb
Senior Member
 
Registered: Dec 2005
Location: Florida
Distribution: CentOS/Fedora/Pop!_OS
Posts: 2,983

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
1. what OS?
2. what does your exports look like?
3. what type of LAN are you dealing with?
4. have you checked swappiness settings?
5. you can also clear you cache and force it to not fill so often its different depending on the OS.
6. what is the client OS?
 
Old 04-20-2013, 09:01 PM   #3
TiredOfThis
Member
 
Registered: Oct 2007
Posts: 53

Original Poster
Rep: Reputation: 15
1. Client and server are running Arch Linux
2. /mnt/raid 10.10.10.1/24(rw,root_squash,insecure,no_subtree_check,sync,nohide)
3. Client -> Gigabit switch -> server
4. Swappiness is at 60. Do you think I should tune it one way or the other?
 
Old 04-21-2013, 12:24 AM   #4
lleb
Senior Member
 
Registered: Dec 2005
Location: Florida
Distribution: CentOS/Fedora/Pop!_OS
Posts: 2,983

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
Quote:
Originally Posted by TiredOfThis View Post
1. Client and server are running Arch Linux
2. /mnt/raid 10.10.10.1/24(rw,root_squash,insecure,no_subtree_check,sync,nohide)
3. Client -> Gigabit switch -> server
4. Swappiness is at 60. Do you think I should tune it one way or the other?
while im no guru, and specifically with arch as ive never run it, but there are a few things to check so far as tuning goes.

for NFSv4 you should have fsid=0 in there as well as crossmnt as options: example:

Code:
#	NFS4
/exports *(rw,insecure,subtree_check,crossmnt,fsid=0)
I have to use the insecure flag as I have OSx in my LAN. Also from what I have been told you should place your exports for NFSv4 in /exports and mount everything from there.

the LAN should be more then fast enough as long as both ends also run 10/100/1000 NICs and you are cabled with cat5e or better cable.

yes Id tuen swappiness down to 30 or even 10. In a modern system, there really is no reason to have it swapping that often any longer. 60 is still the legacy default that most OSs will set on install.

Not sure that will help, but could not hurt.
 
Old 04-21-2013, 01:53 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Quote:
Originally Posted by TiredOfThis View Post
Recently I've started running into problems where I/O will timeout. I'll do a copy (either through the GUI or the CLI) and it will just hang for a while. dmesg output will show server timeouts (not responding).
What changed ?.
 
Old 04-21-2013, 07:56 AM   #6
TiredOfThis
Member
 
Registered: Oct 2007
Posts: 53

Original Poster
Rep: Reputation: 15
Thanks for your suggestions!

Quote:
Originally Posted by lleb View Post
for NFSv4 you should have fsid=0 in there
Shouldn't matter since I'm exporting only one mount, and I'm not grouping them together (see below).

Quote:
Originally Posted by lleb View Post
as well as crossmnt
I don't export any filesystems onto the raid, and I don't group filesystems, so I shouldn't need crossmt either.

Quote:
Originally Posted by lleb View Post
Also from what I have been told you should place your exports for NFSv4 in /exports and mount everything from there.
This is the usual way to do it. However, since I'm only exporting a single mount, and since I don't care about security since this is an internal LAN, I don't do it that way. It shouldn't affect performance at all.

Quote:
Originally Posted by lleb View Post
yes Id tuen swappiness down to 30 or even 10
I don't think this will help (in fact, if my problem is memory exhaustion it could hurt), but I set it to 30. Problem still persists.
 
Old 04-21-2013, 08:07 AM   #7
TiredOfThis
Member
 
Registered: Oct 2007
Posts: 53

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by syg00 View Post
What changed ?.
Nothing, as far as I can see. Although let me tell you more of the story so maybe you can tell me what could have changed

The fileserver used to be running an ancient version of OpenSUSE (I think it was 10.3) and exported using NFSv3. I never ran an update, because I never rebooted the system. I know that no updates would run because I actually disabled the update system on that machine.

One day I lost power, and the fileserver rebooted. I took the opportunity to update the system software on the client (Arch Linux) machine, but I didn't do anything to the server.

For some reason, after the client came back up, it couldn't connect to the server at all. So I ssh'd into the server and restarted the portmapper and idmapd daemons, and viola. It worked.

Except that now I was getting these timeouts. It only happened when copying a lot of data (usually over a GB), which I don't do very frequently. But it was annoying, and sometimes my backup cronjob would die, putting my data at risk.

I figured it was due to the ancient version of nfsd and Linux I was running on the fileserver, so I bought a new hard drive, installed Arch Linux on it, and freshened the whole thing up to modern. It's now on Linux 3.8.6 and has the most recent (as of about two weeks ago) versions of nfsd, etc. I still don't run updates and don't reboot, but at least now the baseline is more current.

However, even after all this the problem still persists.

I've tried replacing the switch and re-cabling the network, but that doesn't seem to have helped. Also, the problem only shows up with NFS writes (not reads) to the fileserver. I can read data all day long, I can sftp data to the fileserver with no problems, etc. The problem also shows up if I mount the volume on other computers on the network, so it's not just localized to my main desktop PC.

So although I say "nothing changed", as you can see after the problem started happening I changed almost everything, and the issue still persists.
 
Old 04-21-2013, 01:28 PM   #8
lleb
Senior Member
 
Registered: Dec 2005
Location: Florida
Distribution: CentOS/Fedora/Pop!_OS
Posts: 2,983

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
sadly sorry im out of ideas then. that is about as much of what i know for NFS. Good luck. sorry i could not be of more assistance.
 
Old 04-21-2013, 08:59 PM   #9
TiredOfThis
Member
 
Registered: Oct 2007
Posts: 53

Original Poster
Rep: Reputation: 15
Thanks for trying! This has got me stumped as well.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Specify NFS Mount Timeout niteshadw Linux - Server 3 11-28-2012 11:00 PM
NFS fileserver no permissions on Mac Nr18 Other *NIX 1 09-01-2009 03:55 PM
how to access windows 2003R2 fileserver's NFS sharing folder? jimmyjiang Red Hat 1 01-24-2007 05:12 AM
Fileserver issues - NFS and Samba Phaethar Linux - Enterprise 0 07-21-2006 08:08 AM
NFS Timeout mrsolo Linux - Software 5 09-10-2002 02:21 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 05:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration