Got stuck with CMAN - unable to establish quorum
Hi,
I was trying most of the day today to get a demo installation of Red Hats cman up und and running, as I rely on clvm afterwards. I did it on a 2 node cluster each site running a recent CentOS 5.5 and connected in a local network. Basically I followed the instructions on this link (plus surrounding chapters).
This is my /etc/cluster/cluster.conf
Code:
<?xml version="1.0" ?>
<cluster alias="wavecloud" config_version="5" name="wavecloud">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="centos1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="hands_on" nodename="centos1"/>
</method>
</fence>
</clusternode>
<clusternode name="centos2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="hands_on" nodename="centos2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="hands_on"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
Both sites are reachable from the other site via their hostname:
Code:
[root@centos2 ~]# ping centos1
PING centos1 (X.X.X.X) 56(84) bytes of data.
64 bytes from centos1 (X.X.X.X): icmp_seq=1 ttl=64 time=0.286 ms
64 bytes from centos1 (X.X.X.X): icmp_seq=2 ttl=64 time=0.500 ms
--- centos1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.286/0.393/0.500/0.107 ms
[root@centos1 ~]# ping centos2
PING centos2 (X.X.X.X) 56(84) bytes of data.
64 bytes from centos2 (X.X.X.X): icmp_seq=1 ttl=64 time=1.83 ms
--- centos2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.838/1.838/1.838/0.000 ms
I synchronized the cluster.conf on both sites, If I now try to start the cluster I run into the following problem:
Code:
[root@centos1 ssmping-0.9.1]# /etc/init.d/cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed
Timed-out waiting for cluster
[FAILED]
whichs turns out to be:
Interestingly I do not even join the pool (for both nodes):
Code:
[root@centos2 ~]# cman_tool nodes
cman_tool: cman_get_node_count failed: Node is not yet a cluster member
The log states:
Code:
Jul 31 01:34:37 centos1 ccsd[4432]: Starting ccsd 2.0.115:
Jul 31 01:34:37 centos1 ccsd[4432]: Built: Apr 26 2010 13:46:08
Jul 31 01:34:37 centos1 ccsd[4432]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Jul 31 01:34:37 centos1 ccsd[4432]: cluster.conf (cluster name = wavecloud, version = 6) found.
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] join (60 ms) send_join (0 ms) consensus (20000 ms) merge (200 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1402
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] send threads (0 threads)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP token expired timeout (495 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP token problem counter (2000 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP threshold (10 problem count)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP mode set to none.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] heartbeat_failures_allowed (0)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] max_network_delay (50 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Jul 31 01:34:40 centos1 ccsd[4432]: Initial status:: Inquorate
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] The network interface [X.X.X.X] is now up.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Created or loaded sequence id 0.X.X.X.X for this ring.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] entering GATHER state from 15.
Jul 31 01:34:40 centos1 openais[4438]: [CMAN ] CMAN 2.0.115 (built Apr 26 2010 13:46:11) started
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais event service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais message service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais configuration service'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Jul 31 01:34:40 centos1 openais[4438]: [SYNC ] Not using a virtual synchrony filter.
Jul 31 01:35:10 centos1 openais[4438]: [TOTEM] The consensus timeout expired.
Jul 31 01:35:10 centos1 openais[4438]: [TOTEM] entering GATHER state from 3.
...
And additionally
Code:
[root@centos1 ssmping-0.9.1]# cman_tool join
cman_tool: Node is already active
[root@centos1 ssmping-0.9.1]# fence_tool join
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
I really don't see, what I'm doing wrong, as I can also verify that multicast is working:
Code:
[root@centos1 ssmping-0.9.1]# ping -t 1 -c 2 224.0.0.1
PING 224.0.0.1 (224.0.0.1) 56(84) bytes of data.
64 bytes from centos1: icmp_seq=1 ttl=64 time=0.257 ms
64 bytes from centos2: icmp_seq=1 ttl=64 time=1.16 ms (DUP!)
...
|