I/O timeout using NFS fileserver
I have a fileserver exporting a raid5 over NFSv4.
Recently I've started running into problems where I/O will timeout. I'll do a copy (either through the GUI or the CLI) and it will just hang for a while. dmesg output will show server timeouts (not responding). If I run htop on the server it shows the memory filling up with buffers until it hits max, then the drives start churning like mad (meanwhile the I/O stalls on the client). No more progress gets made (and I/O times out) until the retry limit is hit on the client and the operation fails. This is definitely an nfs issue. I can do a local copy to the raid volume or copy the same files via ftp and everything goes fine. dmesg and system journal on fileserver is clean. I realize I can mitigate the problem by adding more RAM to the server, but I'd rather figure out what's broken. I've done some Googling and tweaking things like number of nfs server threads (not helpful) or export and mount options (also seems not helpful). The volume is exported 'sync', so I would expect all I/O to be flushed (making it weird that I see this "memory fills up and then begins to flush" behavior). Any ideas? |
1. what OS?
2. what does your exports look like? 3. what type of LAN are you dealing with? 4. have you checked swappiness settings? 5. you can also clear you cache and force it to not fill so often its different depending on the OS. 6. what is the client OS? |
1. Client and server are running Arch Linux
2. /mnt/raid 10.10.10.1/24(rw,root_squash,insecure,no_subtree_check,sync,nohide) 3. Client -> Gigabit switch -> server 4. Swappiness is at 60. Do you think I should tune it one way or the other? |
Quote:
for NFSv4 you should have fsid=0 in there as well as crossmnt as options: example: Code:
# NFS4 the LAN should be more then fast enough as long as both ends also run 10/100/1000 NICs and you are cabled with cat5e or better cable. yes Id tuen swappiness down to 30 or even 10. In a modern system, there really is no reason to have it swapping that often any longer. 60 is still the legacy default that most OSs will set on install. Not sure that will help, but could not hurt. |
Quote:
|
Thanks for your suggestions!
Quote:
Quote:
Quote:
Quote:
|
Quote:
The fileserver used to be running an ancient version of OpenSUSE (I think it was 10.3) and exported using NFSv3. I never ran an update, because I never rebooted the system. I know that no updates would run because I actually disabled the update system on that machine. One day I lost power, and the fileserver rebooted. I took the opportunity to update the system software on the client (Arch Linux) machine, but I didn't do anything to the server. For some reason, after the client came back up, it couldn't connect to the server at all. So I ssh'd into the server and restarted the portmapper and idmapd daemons, and viola. It worked. Except that now I was getting these timeouts. It only happened when copying a lot of data (usually over a GB), which I don't do very frequently. But it was annoying, and sometimes my backup cronjob would die, putting my data at risk. I figured it was due to the ancient version of nfsd and Linux I was running on the fileserver, so I bought a new hard drive, installed Arch Linux on it, and freshened the whole thing up to modern. It's now on Linux 3.8.6 and has the most recent (as of about two weeks ago) versions of nfsd, etc. I still don't run updates and don't reboot, but at least now the baseline is more current. However, even after all this the problem still persists. I've tried replacing the switch and re-cabling the network, but that doesn't seem to have helped. Also, the problem only shows up with NFS writes (not reads) to the fileserver. I can read data all day long, I can sftp data to the fileserver with no problems, etc. The problem also shows up if I mount the volume on other computers on the network, so it's not just localized to my main desktop PC. So although I say "nothing changed", as you can see after the problem started happening I changed almost everything, and the issue still persists. |
sadly sorry im out of ideas then. that is about as much of what i know for NFS. Good luck. sorry i could not be of more assistance.
|
Thanks for trying! This has got me stumped as well.
|
All times are GMT -5. The time now is 10:16 AM. |