NFS locking problem when using IP alias or secondary IP (Heartbeat)
I try to set up a highly available NFS server using Heartbeat and DRBD. Everything works quite well, but I experience a file locking problem when I mount the NFS shares using the Heartbeat-controlled IP alias (using the Heartbeat IPaddr script) or secondary IP address (using the Heartbeat IPaddr2 script).
How Heartbeat (or DRBD) works is really not important for this problem. The problem comes down to this:
- I have a NFS server on an internal network, IP address 192.168.1.2.
- This server has an IP alias 192.168.1.11.
- I have two NFS clients, IP address 192.168.1.107 (client1) and 192.168.1.108 (client2)
- When I mount an NFS share on the clients using the 192.168.1.2 address everything works, including file locking.
- When I mount the NFS share using the 192.168.1.11 address file locking also works, but I experience this specific problem:
I lock (flock) an NFS shared file on client1 for 5 seconds, and during that interval I try to lock the same file on client2. Client2 has to wait until the lock comes free. This works, ony client2 is not notified by the NFS server when the lock comes free. It keeps waiting, until client2 decides to ask the NFS server for the lock again (after 30 seconds). This time it is granted the lock, and client2 can update the file. I want to get rid of this delay of half a minute.
Normally, when client1 gives free the lock on the file, the NFS server should inform client2 immediately that the lock has been gone. This is the case when I use the 'normal' IP address 192.168.1.2, but this mechanism does not work when I use the IP alias (or a secondary IP address). It looks like the portmapper (or statd? or lockd?) forgets that client2 is waiting for a status update when it is connected with 192.168.1.11.
When I mount the NFS share using 192.168.1.11 on client1 and 192.168.1.2 on client2, client2 is informed like it should.
I use Debian Etch with the standard Debian 2.6.18 kernel.
Is this a known problem? And above all, what can I do about it? I have searched the internet for days now (really), but I can't find any information about this specific problem. Also not a single 'NFS with Heartbeat' tutorial/howto mentions this problem.
I have to use an IP alias or secondary IP because Heartbeat works this way. I could add an extra NIC to the NFS server(s) and create a Heartbeat script so it uses a primary IP on that extra NIC instead, but that's a workaround I'd like to avoid.