-   Linux - Networking (
-   -   Debugging an NFS Share with tcpdump (

cram869 11-30-2011 02:11 PM

Debugging an NFS Share with tcpdump
I have an NFS issue that seems to only occur on my Fedora 14 machine at the moment. I was actually have a similar issues with a RHEL 6.1 box too, but that seemed to go away after I disabled the services for fcoe and lldpad. I don't really know why those were an issue, but it seemed to work.

The issue I see is that I will get into a shell and try to cd to a directory inside the nfs share (which is auto-mounted by the way), and my prompt will sit there anywhere from 1 to 6 minutes before returning. After that first command, the nfs share will be responsive for a while. Then, if I stop for a few minutes, probably less than 5 minutes, and go back to that same shell, I'll get another delay of a few minutes. I thought this would be a good use of tcpdump, but I'm not sure how to get it to catch just nfs traffic.

Any ideas on command syntax, port numbers, and such to try with tcpdump or ideas on a fix for this particular type of nfs issue are appreciated.


hvulin 12-01-2011 12:24 PM

I would first check your routing table and name resolution if other protocol work fine (can you scp normally to and from the machine?).

Look at the output of your netstat -rn
and contents of /etc/resolv.conf and /etc/hosts files..

cram869 12-09-2011 07:37 PM

That's the funny thing. This share can be accessed by sftp, samba, scp, or https, and I have no issues with any of those from this computer. The name resolution doesn't appear to be a problem either.

Before I had thought that ridding myself of fcoe and lldpad helped my one box, but I think that was just a coincidence. Both of the computers that I was having issues with are on a router with the NFS share located beyond the router. I think the router must have been the majority of the issue. Once I unplugged the laptop and switched over to wireless and plugged the desktop directly into the network without a router, the NFS share was responsive.

For now, it is not really a big deal to run without the router, but I am curious why the firewall was letting enough traffic through for the share to eventually work. The lag is a either 3 minutes and 40 seconds approximately or around 6 minutes 30 seconds. It's consistent enough that I suspect my computer and the remote machine are running through a predefined pattern, and they hit a working combination at one of those time intervals. Once the connection gets through, it's responsive (<<1 second) until I leave it alone for a while. After that I'll see the delay again.

I've heard that newer NFS protocols use dynamic port assignments. Is there any chance those would not get marked "RELATED" or "ESTABLISHED" in a way that the router would block it? Sounds like a good job for tcpdump if had the patience to wade through the information or could give it a detailed enough expression to limit data.

I can't remember the router model at the moment, but it looks identical to my old Linksys WRT54G minus the wireless capability.


WizadNoNext 12-19-2011 08:07 PM

Some routers have really bad performance with SMB and NFS (especially NFS, because it tries to use as big packet as it is possible). Some routers are actually such slow, that I can't believe it (80MiB/s on IEEE802.3ab between desktop and server (connected directly) and 0.6MiB/s between both desktop and server to/from laptop on wireless). NFS is good for whole wired connections, somehow IEEE802.11 isn't good choice for NFS.
You can try to change IEEE802.11 settings (like CTS, fragmentation and CTSless packet (packet smaller then threshold will go through without CTS). I strongly advise you to set CTS on, set CTS to 1460 and fragment to 1500, but it could clash with host without such settings

All times are GMT -5. The time now is 02:45 PM.