LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Cluster Suite - No Failover (https://www.linuxquestions.org/questions/linux-server-73/cluster-suite-no-failover-740413/)

averync 07-15-2009 11:04 PM

Cluster Suite - No Failover
 
Hi,

I have a two-node cluster with the cluster.conf shown below.

(I'm aware that I don't have any fence devices, and that I need them.
I'm just trying to get something to work in test for now)

1. If I move the service via "clusvcadm -r MQ_HA -m gateway-ifdev-mq2" then it fails over fine.

2. If I "shutdown -h" one of the nodes, then it fails-over to other node.

3. But if I power off one of the nodes nothing happens. It does not attempt to start the fail-over.

The messages from /var/log/messages are below:

Very grateful for anyone that can spot why I do not get a failover
starting with this cluster.conf config.

Avery

----- cluster.conf -------------
<?xml version="1.0"?>
<cluster alias="MQ_HA_IFDEV" config_version="57" name="MQ_HA_IFDEV">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="GATEWAY-IFDEV-MQ1" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="GATEWAY-IFDEV-MQ2" nodeid="2" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="MQ_HA_Fail_Domain" ordered="0" restricted="1">
<failoverdomainnode name="GATEWAY-IFDEV-MQ1" priority="1"/>
<failoverdomainnode name="GATEWAY-IFDEV-MQ2" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.16.8.196" monitor_link="1"/>
<netfs export="/MQHA/QM1/data" force_unmount="1" fstype="nfs" host="GATEWAY-IFDEV-WAS1" mountpoint="/MQHA/QM1/data" name="MQ_HA_Mount_data" options=""/>
<netfs export="/MQHA/QM1/log" force_unmount="1" fstype="nfs" host="GATEWAY-IFDEV-WAS1" mountpoint="/MQHA/QM1/log" name="MQ_HA_Mount_log" options=""/>
<script file="/MQHA/bin/mqOCF_Script_QM1.sh" name="mqOCF_Script_QM1"/>
</resources>
<service autostart="0" name="MQ_HA" recovery="relocate">
<netfs ref="MQ_HA_Mount_data">
<netfs ref="MQ_HA_Mount_log">
<script ref="mqOCF_Script_QM1">
<ip ref="172.16.8.196"/>
</script>
</netfs>
</netfs>
</service>
</rm>
</cluster




---------- /var/log/messages --------------
Jul 14 11:29:59 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] The token was lost in the OPERATIONAL state.
Jul 14 11:29:59 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Jul 14 11:29:59 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Jul 14 11:29:59 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] entering GATHER state from 2.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] entering GATHER state from 0.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] Creating commit token because I am the rep.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] Saving state aru 1ac high seq received 1ac
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] Storing new sequence id for ring 78
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] entering COMMIT state.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] entering RECOVERY state.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] position [0] member 172.16.8.149:
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] previous ring seq 116 rep 172.16.8.148
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] aru 1ac high delivered 1ac received flag 1
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] Did not need to originate any messages in recovery.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] Sending initial ORF token
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 kernel: dlm: closing connection to node 1
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] CLM CONFIGURATION CHANGE
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] New Configuration:
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] r(0) ip(172.16.8.149)
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] Members Left:
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] r(0) ip(172.16.8.148)
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] Members Joined:
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] CLM CONFIGURATION CHANGE
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] New Configuration:
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] r(0) ip(172.16.8.149)
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] Members Left:
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] Members Joined:
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [SYNC ] This node is within the primary component and will provide servic
e.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [TOTEM] entering OPERATIONAL state.
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CLM ] got nodejoin message 172.16.8.149
Jul 14 11:30:04 GATEWAY-IFDEV-MQ2 openais[3274]: [CPG ] got joinlist message from node 2

aquaregia 07-16-2009 04:59 AM

Looks like your Inter cluster communication is not working.
For this, nodes can not 'sense' each other.

Did you enable multicasting in switch ?
open-ais require multicasting for inter cluster communication.

averync 07-17-2009 02:09 PM

Thanks aquaregia, it is working now. I created a hack of the fence_ilo script to 'pretend' that I have HP iLO devices.


All times are GMT -5. The time now is 09:17 AM.