Yet Another NFS Stale File Handle Question
I recently stated getting the dreaded "Stale NFS file handle" error. What's interesting, though, is that the errors are coming from files that should never have been touched by NFS, i.e. files on my root partition that are not exported. In particular, the are coming from various files under /var/lib. For example:
blanks started # pwd
blanks started # ls -lah
ls: cannot access hald: Stale NFS file handle
ls: cannot access syslog-ng: Stale NFS file handle
ls: cannot access net.eth0: Stale NFS file handle
ls: cannot access portmap: Stale NFS file handle
ls: cannot access dbus: Stale NFS file handle
drwxr-xr-x 2 root root 4.0K Sep 11 22:19 .
drwxr-xr-x 17 root root 4.0K Sep 11 22:45 ..
lrwxrwxrwx 1 root root 21 Sep 11 22:19 alsasound -> /etc/init.d/alsasound
lrwxrwxrwx 1 root root 20 Sep 11 22:19 bootmisc -> /etc/init.d/bootmisc
lrwxrwxrwx 1 root root 19 Sep 11 22:19 checkfs -> /etc/init.d/checkfs
lrwxrwxrwx 1 root root 21 Sep 11 22:19 checkroot -> /etc/init.d/checkroot
lrwxrwxrwx 1 root root 17 Sep 11 22:19 clock -> /etc/init.d/clock
lrwxrwxrwx 1 root root 23 Sep 11 22:19 consolefont -> /etc/init.d/consolefont
l????????? ? ? ? ? ? dbus
l????????? ? ? ? ? ? hald
lrwxrwxrwx 1 root root 20 Sep 11 22:19 hostname -> /etc/init.d/hostname
lrwxrwxrwx 1 root root 19 Sep 11 22:19 keymaps -> /etc/init.d/keymaps
lrwxrwxrwx 1 root root 17 Sep 11 22:19 local -> /etc/init.d/local
--- snip ---
Due to the nature of the files that generate the error, I'm unable to start important daemons such as hald, portmap, nfsd, mt-daapd, ... ( mt-daapd being the most important :) )
Here is my /etc/exports:
blanks started # cat /etc/exports
# /etc/exports: NFS file systems being exported. See exports(5).
The share partition is a separate physical drive.
Anyone know what causes/how to fix this? I've browsed through the 40-odd posts on this site reporting similar issues, but they've been unhelpful. Google doesn't know a solution either.
I'm running Gentoo with a 2.6.25 kernel. I've got NFS file system support, NFSv3 support, NFS server support, and NFS over TCP support compiled into the kernel (i.e. not modules). I've got nfs-utils 1.1.0-r1 installed. Everything used to work...
Are you running LVM? This error occurs with LVM as well.
Nope, just 2 regular sata drives. The first drive has 3 partitions - boot, root, and swap. The second is just one big partition. Both use ext3. No hardware or software raid.
/dev/sda1 /boot ext2 noauto,noatime 1 2
/dev/sda3 / ext3 noatime 0 1
/dev/sda2 none swap sw 0 0
/dev/sdb1 /share ext3 noatime 0 2
/dev/cdrom /mnt/cdrom auto noauto,ro 0 0
/dev/ipod /mnt/ipod vfat async,nodev,nosuid,user,rw,noauto 0 0
Sorry, I don't have an answer then. Have you rebooted, and does the issue continue with the same files after reboot?
Yes, the problem persists across reboots.
It seems that I can get some of my services back up by first doing a restart on portmap, then starting nfs and mt-daapd. I cannot get hal or dbus back up. Even if I do get those network services back up, I still have the stale nfs handles.
Ok, the first thing I would ensure is that all the file systems are clean. Perform a forced, thorough fsck on all the file systems and examine the situation after that.
If the file systems are clean, then the situation may be a little more troubling. I have never been terribly impressed with Linux's NFS stability, reliability, or performance across releases, and in configuring, and managing some large NFS installations encountered numerous problems, kernel bugs, etc. I found it necessary to often build custom kernels with the latest NFS patches, often just to keep a system from hanging. When compared against Solaris-based NFS systems, Linux's NFS implementation paled. This was several years ago, against 2.4 kernels, and the situation may have since changed.
Let us know what you find.
Thanks for the suggestion. fsck appears to have fixed the problem. As the problems were on my root fs, here's what I did:
1) touch /forcefsck
2) try (and fail) to avoid making obvious poor-taste joke about previous line
3) at grub menu, choose kernel
4) fsck starts cranking away, so go into other room
5) come back, and computer is back at grub menu (rebooted somehow...)
6) choose kernel again
7) fsck starts cranking away and says it repairs fs errors
8) everything seems back to normal now
Some of the previous steps may not have been necessary in fixing the problem.
If the problem reappears, I'll probably try re-evaluating which nfs-related kernel options I chose, get a newer kernel, or switch to something like (ugh) cifs. In any case, the problem seems to have gone away, but I'll post again if it re-appears. Thanks for the help.
|All times are GMT -5. The time now is 05:10 AM.|