NFS failover

hedrick · 02-14-2010, 05:32 PM

What is the current state of NFS failover (i.e. setting up two server with shared storage, with automatic failover if one fails?) I've seen a cookbook, but no details that would let me assess how well it works. There are lots of complex issues with data consistency, but the detailed information on that is years old.

Our needs are fairly simple: 2 servers, a shared array, and I'm reasonably sure that we don't use locking. However we'd like failover to work reliably without loss of data.

I'm most interested in Redhat, although if some other distribution is better that would be useful.

TB0ne · 02-14-2010, 06:53 PM

Quote:

Originally Posted by hedrick

What is the current state of NFS failover (i.e. setting up two server with shared storage, with automatic failover if one fails?) I've seen a cookbook, but no details that would let me assess how well it works. There are lots of complex issues with data consistency, but the detailed information on that is years old.

Our needs are fairly simple: 2 servers, a shared array, and I'm reasonably sure that we don't use locking. However we'd like failover to work reliably without loss of data.

I'm most interested in Redhat, although if some other distribution is better that would be useful.

Nothing simple about what you've just spelled out. The key sticking points:

"Shared Array"
Without loss of data

You can't mount the same file system on two different servers, at the same time. There is work on ZFS, but the whole 'reliable' thing comes up there. Oracle has such a file system, but it's only for Oracle databases.

You can set up something using DBMS (have had mixed results), or lowball it with rsync, to make sure the data on two array's stays constant. Use heartbeat to monitor the two NFS servers, and if one goes down, have a script kick off to modify the IP and MAC addresses. This will depend on how often the data changes on the NFS shares, and how critical the data is, and how much downtime is acceptable. If you can live with a few minutes, you can go REALLY low-tech, and just get two identical RAID cards, and move the cable to the second server, in the event of failure....

If you want better, and money is no object, go with a real SAN solution, use BCV's in the SAN cabinet to make sure there's no data lost, and use a dedicated hardware-failover system (like Radware), to present one address to the world. Heartbeat can be used to mount the SAN volumes in the event of failure.

hedrick · 02-26-2010, 11:01 AM

Quote:

Originally Posted by TB0ne

Nothing simple about what you've just spelled out. The key sticking points:

"Shared Array"
Without loss of data

Thanks. I wouldn't think the shared array would be an issue. Just mount it on the other system when it takes over. I was more concerned about whether there are subtle problems with moving NFS service. Historically there has been a tendency to get hung mounts, lost data, and locking problems.

Whether remounting loses data depends upon how well the NFS server and file system work together. This works with Solaris Cluster, because the NFS server doesn't acknowledge operations until data is at least in the ZIL cache. If the system crashes or the array is moved to the other system, the transactions in the ZIL are executed, and things are fine. A logged file system under LInux should in principle support the same approach, as long as all the pieces fit together properly. (Incidentally, with Solaris Cluster we use NFS v4.)

https://bugzilla.redhat.com/show_bug.cgi?id=132823 suggests that at one time things didn't quite fit together properly, but the data there is old. I was really hoping for an update.

TB0ne · 02-26-2010, 01:26 PM

Quote:

Originally Posted by hedrick

Thanks. I wouldn't think the shared array would be an issue. Just mount it on the other system when it takes over. I was more concerned about whether there are subtle problems with moving NFS service. Historically there has been a tendency to get hung mounts, lost data, and locking problems.

That kinda hits it on the head there. The hung mounts, etc., are the issues you've got to worry about, when you don't unmount it cleanly. The remount on another box is trivial, when those issues are out of the way. And I thought you were talking about having it mounted on BOTH systems at the same time. Now THAT'S problematic.

Quote:

Whether remounting loses data depends upon how well the NFS server and file system work together. This works with Solaris Cluster, because the NFS server doesn't acknowledge operations until data is at least in the ZIL cache. If the system crashes or the array is moved to the other system, the transactions in the ZIL are executed, and things are fine. A logged file system under LInux should in principle support the same approach, as long as all the pieces fit together properly. (Incidentally, with Solaris Cluster we use NFS v4.)

https://bugzilla.redhat.com/show_bug.cgi?id=132823 suggests that at one time things didn't quite fit together properly, but the data there is old. I was really hoping for an update.

It's a tough nut to crack. Even if you go the SAN route, chances are you'll have to fsck the drive(s), before you can remount them on another system. If you enable BCV's behind the scenes, you can snapshot the data, copy it on the SAN frame to another LUN, and have it ready to mount, but that's $$$. And, I've seen the BCV copies have to be fsck'ed before they'll mount too.