How to terminate a stale NFS connection

upnort · 02-13-2015, 06:26 PM

Sometimes an NFS connection does not terminate gracefully. The NFS server will still show an ESTABLISHED connection.

For example, client 192.168.1.7 connects to server 192.168.1.5 to browse some shared files. A power outage occurs and client 192.168.1.7 goes offline. Using netstat, 192.168.1.5 still shows an ESTABLISHED connection.

Or, for some reason the network connection is interrupted but several minutes later when reestablished, netstat will show the new connection and the stale connection as ESTABLISHED.

There are many articles online about using tcpkill and fuser. A challenge is tcpkill succeeds only with active connections. When the remote system is no longer actually connected then tcpkill hangs and does nothing.

Similarly, fuser -k 2049/tcp does nothing as well.

How to forcibly terminate the stale connection?

jpollard · 02-13-2015, 07:49 PM

I think you want to change the default timeouts (which can last about 30 minutes). Normal TCP timeouts are about 15 minutes, so the connection should get cleared then.

upnort · 02-18-2015, 09:13 PM

I wonder whether moving everything to samba might be less stressful. For too many years I have tolerated the way NFS doesn't handle disconnections.

jpollard · 02-19-2015, 05:12 AM

If you are willing to accept the lower security... and the increased complexity in getting file access.

You can always specify using UDP instead of TCP. TCP is faster, but UDP doesn't have persistent connections.

BTW, those persistent connections always happen with any TCP connection that isn't shutdown. Even sshd connections are persistent, which is one reason they implemented the "keep alive" option - as it polls to see if the remote client is actually there, and then gracefully closes the connection if it doesn't get a reply.

suicidaleggroll · 02-19-2015, 09:49 AM

Quote:

Originally Posted by jpollard

Even sshd connections are persistent, which is one reason they implemented the "keep alive" option - as it polls to see if the remote client is actually there, and then gracefully closes the connection if it doesn't get a reply.

Is there any reason why NFS hasn't implemented something similar?

Honestly I'm on both sides of the fence here. I've had an NFS connection established, then the server goes down for some reason without cleanly disconnecting first. When that happens, the client just sits there and waits. This can be a pain when you just want a clean disconnect, but it can also be nice...boot the server back up and everything that was waiting just picks back up where it left off like nothing happened.

Even if NFS did implement an auto-disconnect, I'm not sure if I'd even want it. In many instances it's just too convenient to have your processes sit there and wait patiently until the server is back up, rather than having them violently crash because the files/dirs they were reading/writing to have suddenly disappeared, then you have to go through the hassle of restarting them once the server is back up.

jpollard · 02-19-2015, 10:28 AM

Quote:

Originally Posted by suicidaleggroll

Is there any reason why NFS hasn't implemented something similar?

Yes. Most people don't like the data corruption that can occur when a forced disconnect happens.

Quote:

Honestly I'm on both sides of the fence here. I've had an NFS connection established, then the server goes down for some reason without cleanly disconnecting first. When that happens, the client just sits there and waits. This can be a pain when you just want a clean disconnect, but it can also be nice...boot the server back up and everything that was waiting just picks back up where it left off like nothing happened.

That is what the nfs mount option "hard" and "soft" are for. A hard mount is a constant retry, and hangs the client - and is used to avoid data corruption. A "soft" mount allows the client to interrupt an operation, but at the cost of possible data corruption.

Quote:

Even if NFS did implement an auto-disconnect, I'm not sure if I'd even want it. In many instances it's just too convenient to have your processes sit there and wait patiently until the server is back up, rather than having them violently crash because the files/dirs they were reading/writing to have suddenly disappeared, then you have to go through the hassle of restarting them once the server is back up.

Including the possibility of having corrupted files from I/O transactions that haven't completed.

upnort · 02-19-2015, 01:39 PM

Quote:

A "soft" mount allows the client to interrupt an operation, but at the cost of possible data corruption.

How does that work exactly? In my use case I am not running a dedicated 24/7 NFS server. Basically I am using peer sharing. I use a long tested script to connect to other systems in my home network and another script to disconnect. Yet sometimes weird things happen when the scripts are automated and a clean termination does not occur. Seems soft connections might help.

As I am using more of a peer connection than dedicated server, seems the potential of data loss is minimal for my use case. I don't need any kind of automatic disconnect. I need a way to allow manual forced disconnections.

I can test with the soft parameter, but my question is focused on how to force termination? That is, suppose I connect to my HTPC to transfer files, walk away to do something else, and then return, forgetting I had enabled NFS sharing on the HTPC. I shut down the HTPC, but the client is still connected. Then when I attempt to shutdown the client the client hangs. That is where I need the ability to force the shutdown and avoid the hang.

Or does using the soft parameter avoid the whole force termination problem?

suicidaleggroll · 02-19-2015, 02:08 PM

Quote:

Originally Posted by upnort

That is, suppose I connect to my HTPC to transfer files, walk away to do something else, and then return, forgetting I had enabled NFS sharing on the HTPC. I shut down the HTPC, but the client is still connected. Then when I attempt to shutdown the client the client hangs. That is where I need the ability to force the shutdown and avoid the hang.

Use "umount -fl" to force the client to unmount the share.

upnort · 02-19-2015, 03:50 PM

Quote:

Use "umount -fl" to force the client to unmount the share.

I already do that and have done so for years. Often fails to work.