stale NFS file handle error
Some of users in my office complaining about having slow connectivity to submit their jobs for simulating processes in Linux cluster. I tried to check the nfsd processes in my server and it was around 32 processes (using ps –axlmww command) and 96 (using lsof command).
How many nfsd that we need before it starts overloading our server?
I tried to check the network traffic using netstat –i (To check my interface card condition) in my server and the result was:
Can we consider this as busy network? For information, I got 1 Gbps connection in my LAN.
I tried to check the network using tcpdump command and I got this error:
Read ERROR: Stale NFS file handle
This error happens everytime the server communicating with one of my node. It is node number 29 to be exact.
Could it be node number 29 that caused the slow connectivity?
I googled about NFS file handle error and I found this information:
A stale file handle is given when:
1) an NFS server is heavily loaded and cannot repond to NFS request.
2) The NFS server has crashed (can usually be detected by numerous RPC timeout by the client)
3) The destination mount-point file or directory location has been compromised thus making it inaccessible to the NFS client (the NFS client points to something that does not exist).
How can I know my NFS server is heavily loaded or has crashed (how to detect RPC timeout?) or even there are some files or directories that have been compromised?
Sorry for the bunch of questions but I’d appreciate your response.
With kind regards,
Jr. System Engineer