NFS failover
What is the current state of NFS failover (i.e. setting up two server with shared storage, with automatic failover if one fails?) I've seen a cookbook, but no details that would let me assess how well it works. There are lots of complex issues with data consistency, but the detailed information on that is years old.
Our needs are fairly simple: 2 servers, a shared array, and I'm reasonably sure that we don't use locking. However we'd like failover to work reliably without loss of data. I'm most interested in Redhat, although if some other distribution is better that would be useful. |
Quote:
You can set up something using DBMS (have had mixed results), or lowball it with rsync, to make sure the data on two array's stays constant. Use heartbeat to monitor the two NFS servers, and if one goes down, have a script kick off to modify the IP and MAC addresses. This will depend on how often the data changes on the NFS shares, and how critical the data is, and how much downtime is acceptable. If you can live with a few minutes, you can go REALLY low-tech, and just get two identical RAID cards, and move the cable to the second server, in the event of failure....:) If you want better, and money is no object, go with a real SAN solution, use BCV's in the SAN cabinet to make sure there's no data lost, and use a dedicated hardware-failover system (like Radware), to present one address to the world. Heartbeat can be used to mount the SAN volumes in the event of failure. |
Quote:
Whether remounting loses data depends upon how well the NFS server and file system work together. This works with Solaris Cluster, because the NFS server doesn't acknowledge operations until data is at least in the ZIL cache. If the system crashes or the array is moved to the other system, the transactions in the ZIL are executed, and things are fine. A logged file system under LInux should in principle support the same approach, as long as all the pieces fit together properly. (Incidentally, with Solaris Cluster we use NFS v4.) https://bugzilla.redhat.com/show_bug.cgi?id=132823 suggests that at one time things didn't quite fit together properly, but the data there is old. I was really hoping for an update. |
Quote:
Quote:
|
All times are GMT -5. The time now is 01:55 PM. |