LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Old NFS Server hangs when modern client connects (https://www.linuxquestions.org/questions/linux-server-73/old-nfs-server-hangs-when-modern-client-connects-4175505378/)

davewithheld 05-18-2014 11:20 AM

Old NFS Server hangs when modern client connects
 
This new problem cropped up when I upgraded one of my clients from Fedora 17 to Fedora 20. When I reboot the server (running Fedora 17) and reboot the client under Fedora 17, all works fine. If I boot the dual boot client under Fedora 20, the server hangs (which also hangs the client).

Technically, the server doesn't exactly hang. I can still log in as root and look at processes, run top, etc. I just can't list /. The root filesystem seems inaccessible, which seems odd since I'm able to log in, either as a normal user or as root, either on the console or through SSH. Of course, not much else can be done. If I try to reboot, it starts to go down, then hangs. I've let it sit for an hour and nothing more happens. It responds to CTRL/ALT/DEL, but just repeats the same set of messages (that I can't read because there's too much text and too little screen). I end up doing a hardware reset. All comes back up fine, until one of the "modern" clients tries to connect.

My system is a home network, used for MythTV, sharing family photos, sharing and playing music. It is extremely insecure, with all systems (five) auto-logging in as the same user (different passwords), all apache daemons running as that same user. I think of it as a single computer, distributed throughout the house, with one main file server exporting all of its drives via NFS. Except for one laptop running Fedora 19, they were all running Fedora 17, including the NFS server, which exports its root / filesystem, as well as two others filesystems on other disks. /boot, /home, /var, etc, etc, are just subdirectories of /, not separate mounts or exports. I like to keep things simple. I know these are all bad designs, especially insecure and I would NEVER do any of this at work (where I administer a high availability server for our department), but for home entertainment, it all seems fine and works great!

Until I tried to upgrade one of my clients to Fedora 20. Under Fedora 20, when the upgraded client boots, it hangs trying to mount the NFS mounts on the NFS server (still F17). At this point, none of the other clients can read their existing mounts, nor can the server read its own root! At this point I can log into the client, shut it down, (try to) reboot the server (force it, eventually with a reset), and all comes back up normal (other clients can resume their I/O). The client is still dual booting and I can reboot under F17 and all is well again, until the reboot into F20.

Since I had trouble with F20, I thought maybe it was time to try Ubuntu again (every few years, always end up going back to Fedora). Ubuntu had the same, exact issue.

If I set the fstab entries to noauto, the client boots fine. I can manually mount the exports after the client boots and use the mounts indefinitely, but eventually, it will hang again. If I leave the mounts as auto, it will hang on boot every time.

As I mentioned above, my systems are EXTREMELY insecure, but it is simply a home network. The root users of all systems can freely access all files on the server, and the one user can access all files owned by that user on any of the systems. Here is my server's exports file:

/ 192.168.0.0/16(rw,no_root_squash,no_subtree_check)
/m1 192.168.0.0/16(rw,no_root_squash,no_subtree_check)
/m2 192.168.0.0/16(rw,no_root_squash,no_subtree_check)

An example of a client's fstab entry:
server:/ /mntpoint nfs soft,intr,noatime,defaults 0 0

Clients that work are running kernels 3.9.10, 3.9.8 and 3.10.7. Kernels that don't work (F20 and Ubuntu) are 3.14.4 and 3.13.0.

This has worked for years, since the days of RedHat 9 (before it forked to Fedora). I have upgraded all my systems and built new ones over the years and never had any trouble. I'm guessing it has to do with security and some new version of NFS that has made what I do incompatible with systems designed to be secure. Have there been recent changes in NFS that would break my system and cause an insecure server to (sort of) hang? Could this be related to IPV6 (which I no nothing about and told the clients to "ignore")?

lleb 05-19-2014 09:01 PM

try taking it out of fstab and using autofs with the -bg flag. that should help

also using Fedora XX is a bad idea for a server. Fedora is a bleeding edge distro. you are much better off using CentOS as your server. id wait until later this year when v7 is released as RHELv7 is now in public beta. thats a good indicator that CentOSv7 will be out soon too.

Keep in mind that with Fedora 19 and 20 the older sysV is being replaced with systemd. you also appear to be mixing NFSv3 and NFSv4 in your exports and fstab entries.

as both F17 and F20 use NFSv4, id stick with that format and forget about the NFSv3 unless you have OSx in your LAN someplace that also needs to connect.

When you upgraded to F20 on the "server" did you also reconfigure both SELinux and IPTables as well as the NFS configuration to use static ports?

davewithheld 05-20-2014 10:38 PM

I didn't upgrade the server. I upgraded the client (twice, once with Ubuntu). All of my systems (including the new installation) have Selinux disabled and none of the clients have firewalls. The interface on the server has a limiting firewall that allows everything from my clients but nothing from any other IP.

What is it about the exports and fstab entries that implies V3 vs V4? I certainly had no intention of using V3. Guess I should do a man exports. Thanks for the reply.

davewithheld 05-21-2014 10:55 PM

Aha, upgrading to NFSV4 is not simply a matter of upgrading the OS to a version that knows about it, one has to reconstruct the exports structure to fit into V4. https://help.ubuntu.com/community/NFSv4Howto and http://www.citi.umich.edu/projects/n...ing-nfsv4.html were both very good. Thanks, lleb, for the eye-opening. I haven't finished reconfiguring, yet, but I'll post back if it does any good.

lleb 05-22-2014 06:59 AM

yes, sorry should have been more clear.

voleg 05-22-2014 07:06 AM

Please check if autofs run on server.
If yes, please uninstall it.

lleb 05-23-2014 06:49 AM

why uninstall autofs?

davewithheld 05-31-2014 08:44 AM

Configuring for NFS4 did the trick, but I'm not sure I did it right. /export was already there, probably part of the nfs package or created the first time the service was started. I didn't create it and wasn't using it. I created subdirectories under /export with the same names as the mounts I was exporting directly before and added bind mounts to fstab, binding the /export/mountname to /mountname for each exported filesystem. Then I added fsid=0 to the exports file for the / directory. What I'm not sure of is the following fstab entry:

/ /export none bind 0 0

Now when I ls /export, I see my root filesystem, not the contents of /export. But the following fstab entries also work:

/mountpoint1 /export/mountpoint1 none bind 0 0
/mountpoint2 /export/mountpoint2 none bind 0 0

Any way, I've rebooted into Fedora 20 several times, now, and have yet to have a problem. Unfortunately, I was able to boot it several times before after trying this and that, but it always eventually failed. It has yet to fail with this configuration, which is much more NFS4-based, and seems to work fine. On with the upgrade!!

Thanks, lleb, for the heads-up!

davewithheld 05-31-2014 10:20 AM

Nope. As soon as I posted that it worked, I rebooted in F20 and hung the server. Bummer. What I'm thinking might be a problem is a symlink I added to the server to refer to its own root. It was a lazy way of keeping my paths from breaking as I changed the system over the years. I have removed that symlink and will see how many times I can boot in F20 without hanging the server.

davewithheld 06-13-2014 10:30 AM

Alas, I'm giving up (wish I could mark this "UNSOLVED"). I have upgraded another client to F19 and it, too, causes the F17 server to stop responding (I choose not to use the term "hang" because I can still issue a reboot command on the server, it just doesn't go down clean). The reason I upgraded a client in the first place was so I could start migrating the functionality to another machine and eventually retire the old one. It's a real PITA, though, trying to get all the stuff copied over when each time I mount a share I end up hardware resetting the server. It then works for a while. I'm thinking it might have something to do with two versions of Linux on the client and some sort of resuming connections on the server that causes some authentication issue and hangs the NFS server. Once NFS hangs, that's all she wrote (litterally).

Anyway, I'm moving ahead with developing the new client into the replacement server and giving up. I would love to hear of a solution, though, as it would make the transition much simpler.

P.S. It just occurred to me that maybe I should uninstall and reinstall NFS on the server. That may clean out some leftover client information that could be confusing things. If I get time...

davewithheld 06-13-2014 10:31 AM

Something caused a double-post (used the quick post). Any way to delete this post (edited duplicate))

lleb 06-13-2014 08:12 PM

here is my exports from my CentOS 6.x server. again using Fedora as any kind of server is a bad idea:

Code:

$ cat /etc/exports
#
#        /etc/exports

#        NFS4
/exports *(rw,insecure,subtree_check,crossmnt,fsid=0)

#        NFSv3
/exports/centos/public *(rw,insecure,no_subtree_check,fsid=3010)
/exports/NFS_TV_Shows *(rw,insecure,no_subtree_check,fsid=3020)

note the difference between the NFSv3 and NFSv4 lines. also note that each share has a unique fsid and that I use the insecure option due to having OSx in my network.

and on one of my Fedora laptops here is the autofs

Code:

$ cat /etc/auto.NFS
Shares        -rw,soft,intr,bg,rsize=8192,wsize=8192        server:/exports/

hope that helps point you in a positive direction.


All times are GMT -5. The time now is 08:51 PM.