File locking issues with clustering, GFS, and NFS
We are attempting to implement high-availability NFS services using Red Hat Cluster Suite and GFS file systems, shared out to a number of NFS clients. This is one of the patterns described in Red Hat documentation, wherein a pair of servers act as the NFS servers in an active/passive configuration. A virtual IP address is created as a secondary logical network interface and moved from server to server by cluster failover scripts.
The current configuration defines the NFS shares using the traditional NFS configuration file /etc/exports on the currently-active NFS server and and /etc/fstab on the NFS clients. In some versions of Red Hat clustering documents, this is pointed out as an invalid configuration, saying that the proper way to do this is to define the HA services entirely within the cluster.conf file (or to use the cluster UI to do the same). Later versions of GFS and clustering documents no longer mention this constraint.
When an NFS client mounts the NFS share, which, again, is a GFS file system mounted directly via a SAN on the two clustered GFS/NFS servers, most file I/O operations seem to be fine. It is only when an NFS client attempts to obtain an exclusive lock on a file using the fcntl system call that problems are observed. We have seen two behaviors in two different test systems, despite an attempt to make them identical at the outset:
Scenario 1) The first NFS client to attempt to get a lock succeeds and is able to release the lock explicitly via a system call or implicitly at exit. As long as only this first client is used to run the test program, it can be executed any number of times in a row with no errors. As soon as any other client or one of the GFS server nodes attempts to get a lock, the fcntl system hangs forever. The first client, recall, has released the lock and we ensured that 30 seconds or so had passed to clear any caching delays. The second process is non-interruptible once it hangs. If the second machine was the GFS/NFS server, it will have left a process on that GFS server that cannot be killed by any user, including root. If the second machine happened to be an NFS client, it will also have created a process on the active GFS/NFS server that cannot be killed by any user, including root. A reboot is required to remove the PID in either case.
Scenario 2) In our second test bed, no NFS client is able to obtain a lock at all. However, even the attempt causes the same behavior as described above on the GFS server.
When we mount the NFS file systems using the primary (physical) address of the the active NFS server, the locking issues go away. The use of a virtual IP address, which is necessary to achieve transparent failover, seems to be the critical factor.
NOTE: This has been misinterpreted in some forums as a Java issue, relating to the java.nio.channels.FileChannel locking interface. However, using strace on the Java program shows that in any reasonably modern JVM, the Java implementation wraps the fcntl system call. Thus, the problem can be generalized to the fcntl call. The use of the BSD flock() system call may not reliably reproduce the problem.
My first question is whether this hybrid approach to GFS/NFS high availability is supported or should be expected to work? We have not yet tried configuring NFS services through the cluster confguration tool, but plan to do that. The problem is that we have deployed the hybrid solution into a system that cannot be taken off line and reconfigured easily.
The second question is whether the particular combination of clustering, GFS distributed locking, NFS locking, and Virtual IP addressing for high availability is valid?