LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices



Reply
 
Search this Thread
Old 03-24-2011, 10:26 AM   #1
Phaethar
Member
 
Registered: Oct 2003
Location: MN
Distribution: CentOS, Fedora
Posts: 182

Rep: Reputation: 30
Question 2-node clustering setup issues with RHCS


Hey all,

I've been trying to set up a 2 node cluster using the Red Hat Cluster Service (on CentOS 5.5), and I just haven't been having much luck.

So first, the basic set up. As I said, 2 nodes, identical hardware, connected to a storage array (/dev/sdb). I partitioned the array to use almost all of it as sdb1, which holds the data. The second, small partition of sdb2 is set up as the quorum disk. Fencing is configured using an APC PDU.

Here's how the cluster.conf file looks:

Code:
<?xml version="1.0"?>
<cluster alias="TestCluster" config_version="18" name="TestCluster">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="192.168.108.212" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="RackPDU" port="2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.108.211" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="RackPDU" port="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="4"/>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.108.215" login="apc" name="RackPDU" passwd="********"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="Node1FailOver" nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="192.168.108.212" priority="2"/>
                                <failoverdomainnode name="192.168.108.211" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="Node2FailOver" nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="192.168.108.212" priority="1"/>
                                <failoverdomainnode name="192.168.108.211" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.168.108.220" monitor_link="1"/>
                        <clusterfs device="/dev/mapper/Vol_SAN-LVSAN" force_unmount="0" fsid="27588" fstype="gfs2" mountpoint="/TestVol1" name="GFSShare" options="rw"
self_fence="0"/>
                        <nfsexport name="NFSExport"/>
                        <nfsclient allow_recover="0" name="ClusterManager" options="rw,sync,anonuid=509,anongid=511" target="192.168.108.214"/>                         
                        <smb name="Testserver" workgroup="Workgroup"/>
                </resources>
                <service autostart="1" domain="Node1FailOver" exclusive="0" name="NFS" nfslock="1" recovery="relocate">
                        <ip ref="192.168.108.220">
                                <clusterfs fstype="gfs" ref="GFSShare">
                                        <nfsexport ref="NFSExport">
                                                <nfsclient name=" " ref="ClusterManager"/>
                                        </nfsexport>
                                </clusterfs>
                        </ip>
                </service>
                <service autostart="1" domain="Node1FailOver" exclusive="0" name="GFS" recovery="relocate">
                        <clusterfs fstype="gfs" ref="GFSShare"/>
                </service>
                <service autostart="1" domain="Node1FailOver" exclusive="0" name="IP_Address" recovery="relocate">
                        <ip ref="192.168.108.220"/>
                </service>
                <service autostart="1" domain="Node1FailOver" exclusive="0" name="Samba" nfslock="1" recovery="relocate">
                        <smb ref="Testserver"/>
                </service>
        </rm>
        <quorumd interval="3" label="testQdisk" min_score="1" tko="6" votes="2"/>
</cluster>
So, Node1 is 192.168.108.211, and Node2 is 192.168.108.212. Virtual IP for the cluster is 192.168.103.220. NFS and Samba services are set up (and yes, I know Samba isn't cluster aware, but we still need to have it set up, even if the service needs a manual restart in the event of a fail over).

Up until this point, everything works. Clients can bee added for NFS and they can connect, Samba works, etc. Problem is, as soon as I try to test the fail over capabilities, the entire thing comes crashing down. I unplugged the network cable from Node1 to see what would happen. It showed as down via Conga. Node2 displayed a message that the quorum was dissolved, and at that point it lost the mount on the storage array. Bringing Node1 back up had the same issue. So now, after unplugging Node1 from the network, both nodes are down, and I can't get either of them reconnected to the array to bring services back on. No amount of rebooting will make the cluster quorate again. So, I'm stuck.

Now, I'm just learning about the how to set up a cluster, so I'm sure some of my settings are not correct. Specifically, I'm not sure about the failover domains and how those work, and the quorum disk. Most documentation says the number of votes for the quorum disk should be (nodes - 1), which is just 1 in this case. Others have setups where the qdisk itself counts as a vote, meaning I could use 2 as the minimum with 3 total. And if the quorum disk is so fragile that a single node going down nukes the entire cluster... what's the point? How do I go about recovering the cluster in this case? It's completely dead at the moment.

Just looking for a little advice on what I'm doing wrong.

Thanks!

Last edited by Phaethar; 03-24-2011 at 10:27 AM.
 
Old 04-21-2011, 03:44 PM   #2
nipples
LQ Newbie
 
Registered: Apr 2011
Posts: 1

Rep: Reputation: 0
expected votes

Hey ,
I think your expected votes is the culprit.
The way I see it , you are saying in the conf that you need 4 votes to stay quorate.. but that's all the votes you have.
I'd try lowering the value to 3.

--Good luck
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need to setup Clustering? your_shadow03 Linux - Newbie 2 05-14-2009 07:43 AM
How can I setup a multiple node id? Which file would be used to setup ENV par? robinsondean Linux - Newbie 1 07-28-2008 08:55 PM
Clustering Issues seyiton Red Hat 0 06-05-2008 05:21 AM
Wrapping my head around Xen/RHCS Clustering (CentOS 5) binjured Red Hat 1 12-31-2007 10:12 AM
Solaris 2 node clustering RajaRC Solaris / OpenSolaris 1 09-19-2004 04:27 AM


All times are GMT -5. The time now is 07:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration