LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 02-14-2010, 05:32 PM   #1
hedrick
LQ Newbie
 
Registered: Feb 2010
Posts: 2

Rep: Reputation: 0
NFS failover


What is the current state of NFS failover (i.e. setting up two server with shared storage, with automatic failover if one fails?) I've seen a cookbook, but no details that would let me assess how well it works. There are lots of complex issues with data consistency, but the detailed information on that is years old.

Our needs are fairly simple: 2 servers, a shared array, and I'm reasonably sure that we don't use locking. However we'd like failover to work reliably without loss of data.

I'm most interested in Redhat, although if some other distribution is better that would be useful.
 
Old 02-14-2010, 06:53 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,636

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by hedrick View Post
What is the current state of NFS failover (i.e. setting up two server with shared storage, with automatic failover if one fails?) I've seen a cookbook, but no details that would let me assess how well it works. There are lots of complex issues with data consistency, but the detailed information on that is years old.

Our needs are fairly simple: 2 servers, a shared array, and I'm reasonably sure that we don't use locking. However we'd like failover to work reliably without loss of data.

I'm most interested in Redhat, although if some other distribution is better that would be useful.
Nothing simple about what you've just spelled out. The key sticking points:
  • "Shared Array"
  • Without loss of data
You can't mount the same file system on two different servers, at the same time. There is work on ZFS, but the whole 'reliable' thing comes up there. Oracle has such a file system, but it's only for Oracle databases.

You can set up something using DBMS (have had mixed results), or lowball it with rsync, to make sure the data on two array's stays constant. Use heartbeat to monitor the two NFS servers, and if one goes down, have a script kick off to modify the IP and MAC addresses. This will depend on how often the data changes on the NFS shares, and how critical the data is, and how much downtime is acceptable. If you can live with a few minutes, you can go REALLY low-tech, and just get two identical RAID cards, and move the cable to the second server, in the event of failure....

If you want better, and money is no object, go with a real SAN solution, use BCV's in the SAN cabinet to make sure there's no data lost, and use a dedicated hardware-failover system (like Radware), to present one address to the world. Heartbeat can be used to mount the SAN volumes in the event of failure.
 
Old 02-26-2010, 11:01 AM   #3
hedrick
LQ Newbie
 
Registered: Feb 2010
Posts: 2

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by TB0ne View Post
Nothing simple about what you've just spelled out. The key sticking points:
  • "Shared Array"
  • Without loss of data
Thanks. I wouldn't think the shared array would be an issue. Just mount it on the other system when it takes over. I was more concerned about whether there are subtle problems with moving NFS service. Historically there has been a tendency to get hung mounts, lost data, and locking problems.

Whether remounting loses data depends upon how well the NFS server and file system work together. This works with Solaris Cluster, because the NFS server doesn't acknowledge operations until data is at least in the ZIL cache. If the system crashes or the array is moved to the other system, the transactions in the ZIL are executed, and things are fine. A logged file system under LInux should in principle support the same approach, as long as all the pieces fit together properly. (Incidentally, with Solaris Cluster we use NFS v4.)

https://bugzilla.redhat.com/show_bug.cgi?id=132823 suggests that at one time things didn't quite fit together properly, but the data there is old. I was really hoping for an update.

Last edited by hedrick; 02-26-2010 at 11:03 AM.
 
Old 02-26-2010, 01:26 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,636

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by hedrick View Post
Thanks. I wouldn't think the shared array would be an issue. Just mount it on the other system when it takes over. I was more concerned about whether there are subtle problems with moving NFS service. Historically there has been a tendency to get hung mounts, lost data, and locking problems.
That kinda hits it on the head there. The hung mounts, etc., are the issues you've got to worry about, when you don't unmount it cleanly. The remount on another box is trivial, when those issues are out of the way. And I thought you were talking about having it mounted on BOTH systems at the same time. Now THAT'S problematic.
Quote:
Whether remounting loses data depends upon how well the NFS server and file system work together. This works with Solaris Cluster, because the NFS server doesn't acknowledge operations until data is at least in the ZIL cache. If the system crashes or the array is moved to the other system, the transactions in the ZIL are executed, and things are fine. A logged file system under LInux should in principle support the same approach, as long as all the pieces fit together properly. (Incidentally, with Solaris Cluster we use NFS v4.)

https://bugzilla.redhat.com/show_bug.cgi?id=132823 suggests that at one time things didn't quite fit together properly, but the data there is old. I was really hoping for an update.
It's a tough nut to crack. Even if you go the SAN route, chances are you'll have to fsck the drive(s), before you can remount them on another system. If you enable BCV's behind the scenes, you can snapshot the data, copy it on the SAN frame to another LUN, and have it ready to mount, but that's $$$. And, I've seen the BCV copies have to be fsck'ed before they'll mount too.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cluster NFS Late Failover? your_shadow03 Linux - Server 2 09-11-2008 10:55 PM
Set up failover NFS by RHCS PhillipHuang Red Hat 1 09-09-2008 09:05 PM
Failover vkmgeek Linux - Networking 5 04-05-2008 03:10 PM
IP failover sanjibgupta Linux - Networking 6 05-04-2007 04:01 AM
webserver failover imi@tux Linux - Networking 1 05-23-2006 02:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration