LinuxQuestions.org - How long a node failover and another node take over resources on HA cluster?

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - How long a node failover and another node take over resources on HA cluster? (https://www.linuxquestions.org/questions/linux-server-73/how-long-a-node-failover-and-another-node-take-over-resources-on-ha-cluster-840882/)

How long a node failover and another node take over resources on HA cluster?

I don't have much experience in clustering. And I'm deploying a cluster system on CentOS. But I don't know how long a node failover and another node take over those resouces to continue running service is good, fast or slow? 1s, 10s or ??

Hello and Welcome to LinuxQuestions,

You can easily configure the time to declare a node as unavailable (dead) in your cluster setup. Since you didn't specify what clustering/high availability software you're using it's hard to tell you what to put in the config. In HeartBeat for example you can use:

Code:

#

#      keepalive: how many seconds between heartbeats

#

keepalive 2

#

#      deadtime: seconds-to-declare-host-dead

#

deadtime 10

to indicate when to move services and resources to the other node(s).

To what value you should set the parameters is up to you, if your services are business critical then you should set them low in order to have a higher disponibility meaning that if one node fails for that limited amount of time then the resources/services will be moved. You can also opt-out of failback, meaning that if resources/services are moved to another node, they will not be re-moved when the failing node becomes available again.

Kind regards,

Eric

Sorry, it's my fault, and thanks for your help. I used lucci and ricci (Clustering software group on CentOS) to setup HA system. Could you tell me the standard to estimate the HA system? Here is my /etc/cluster/cluster.conf.

Code:

<?xml version="1.0"?>

<cluster alias="Testcluster" config_version="44" name="Testcluster">

        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="29"/>

        <clusternodes>

                <clusternode name="node1-ic" nodeid="1" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="Manual1" nodename="node1-ic"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="node2-ic" nodeid="2" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="Manual2" nodename="node2-ic"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <cman expected_votes="1" two_node="1"/>

        <fencedevices>

                <fencedevice agent="fence_manual" name="Manual1"/>

                <fencedevice agent="fence_manual" name="Manual2"/>

        </fencedevices>

        <rm>

                <failoverdomains>

                        <failoverdomain name="Test_fo_domain" nofailback="0" ordered="0" restricted="0">

                                <failoverdomainnode name="node1-ic" priority="1"/>

                                <failoverdomainnode name="node2-ic" priority="1"/>

                        </failoverdomain>

                </failoverdomains>

                <resources>

                        <ip address="222.255.239.152" monitor_link="1"/>

                        <clusterfs device="/dev/OBS_DATA/USER_DATA" force_unmount="0" fsid="7895" fstype="gfs2" mountpoint="/data" name="DATA" options="rw" self_fence="0"/>

                        <script file="/etc/init.d/obsrcluster" name="obsr"/>

                        <script file="/etc/rc.d/init.d/httpd" name="httpd"/>

                </resources>

                <service autostart="1" domain="Test_fo_domain" exclusive="0" name="testservice" recovery="relocate">

                        <ip ref="222.255.239.152"/>

                        <clusterfs ref="DATA"/>

                        <script ref="httpd"/>

                </service>

        </rm>

</cluster>