NFS networked cluster, computers hang on shutdown

j.sudol · 08-06-2013, 07:43 PM

Help.

Here's the situation. I have one computer with a complete Slackware Linux 14.0 install. This computer will eventually serve as the command center for a cluster of thirty computers. I have a second computer, a prototype for the thirty computers in the cluster, with a slim set of packages installed to reduce the overhead and optimize performance. Eventually, I will clone the second computer 29 times to create a cluster of 30 machines designed to carry out some extensive N-body integrations. I'm modeling extra-solar planetary systems.

At the moment, the hard drives on the command computer and the prototype are cross mounted so that I can move files back and forth easily. The cross-mounting is working quite well - the only problem is that one computer will hang on shutdown if the other computer is already shutdown. Granted, I don't expect to shutdown these computers often, so this is more an annoyance than a real problem, but I'd like it to go away.

In detail... After shutting down one computer, if I shut down the second (using shutdown -h now), I get the following notices:

killing process holding NSF mount /proc/fs/nfs open...
killing process holding NSF mount /proc/fs/nfsd open...
killing process holding NSF mount /home/mercury6/c01

[The last directory is the directory on the machine that is shutting down that is cross mounted to the machine that is now shut down.]

nfsd has been unmounted
umount: /prc/fs/nfs device is busy

Remounting root filesystem rad only
/dev/sda1 on / type ext2 (ro)

[HANG]

I'm not sure why there is /proc/fs/nfs and a proc/fs/nfsd. I didn't see that when I had the computers running Slackware 9.1. Why are there two nfs processes running, nfs and nfsd? and what's the difference between them? Is this a significant clue?

jjs

lleb · 08-07-2013, 01:21 PM

if you are not mounting them properly this will ALWAYS happen. as you are mounting these as NFS (im hoping NFSv4) then you can use both the --ghost and -bg option as well as --time-out=foo

https://www.centos.org/docs/5/html/D...g-options.html

check that link out.

and here

http://linux.die.net/man/5/nfs

there are ways of dealing with the situation you have. it all boils down to how you mount the NFS shares.