LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Enterprise (https://www.linuxquestions.org/questions/linux-enterprise-47/)
-   -   Got stuck with CMAN - unable to establish quorum (https://www.linuxquestions.org/questions/linux-enterprise-47/got-stuck-with-cman-unable-to-establish-quorum-823132/)

lxf 07-30-2010 06:39 PM

Got stuck with CMAN - unable to establish quorum
 
Hi,
I was trying most of the day today to get a demo installation of Red Hats cman up und and running, as I rely on clvm afterwards. I did it on a 2 node cluster each site running a recent CentOS 5.5 and connected in a local network. Basically I followed the instructions on this link (plus surrounding chapters).

This is my /etc/cluster/cluster.conf
Code:

<?xml version="1.0" ?>
<cluster alias="wavecloud" config_version="5" name="wavecloud">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="centos1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="hands_on" nodename="centos1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="centos2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="hands_on" nodename="centos2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="hands_on"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

Both sites are reachable from the other site via their hostname:

Code:

[root@centos2 ~]# ping centos1
PING centos1 (X.X.X.X) 56(84) bytes of data.
64 bytes from centos1 (X.X.X.X): icmp_seq=1 ttl=64 time=0.286 ms
64 bytes from centos1 (X.X.X.X): icmp_seq=2 ttl=64 time=0.500 ms

--- centos1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.286/0.393/0.500/0.107 ms


[root@centos1 ~]# ping centos2
PING centos2 (X.X.X.X) 56(84) bytes of data.
64 bytes from centos2 (X.X.X.X): icmp_seq=1 ttl=64 time=1.83 ms

--- centos2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.838/1.838/1.838/0.000 ms


I synchronized the cluster.conf on both sites, If I now try to start the cluster I run into the following problem:

Code:

[root@centos1 ssmping-0.9.1]# /etc/init.d/cman start
Starting cluster:
  Loading modules... done
  Mounting configfs... done
  Starting ccsd... done
  Starting cman... failed
Timed-out waiting for cluster
                                                          [FAILED]

whichs turns out to be:


Interestingly I do not even join the pool (for both nodes):

Code:

[root@centos2 ~]# cman_tool nodes
cman_tool: cman_get_node_count failed: Node is not yet a cluster member

The log states:

Code:

Jul 31 01:34:37 centos1 ccsd[4432]: Starting ccsd 2.0.115:
Jul 31 01:34:37 centos1 ccsd[4432]:  Built: Apr 26 2010 13:46:08
Jul 31 01:34:37 centos1 ccsd[4432]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Jul 31 01:34:37 centos1 ccsd[4432]: cluster.conf (cluster name = wavecloud, version = 6) found.
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] join (60 ms) send_join (0 ms) consensus (20000 ms) merge (200 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1402
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] send threads (0 threads)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP token expired timeout (495 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP token problem counter (2000 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP threshold (10 problem count)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] RRP mode set to none.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] heartbeat_failures_allowed (0)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] max_network_delay (50 ms)
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Jul 31 01:34:40 centos1 ccsd[4432]: Initial status:: Inquorate
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] The network interface [X.X.X.X] is now up.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] Created or loaded sequence id 0.X.X.X.X for this ring.
Jul 31 01:34:40 centos1 openais[4438]: [TOTEM] entering GATHER state from 15.
Jul 31 01:34:40 centos1 openais[4438]: [CMAN ] CMAN 2.0.115 (built Apr 26 2010 13:46:11) started
Jul 31 01:34:40 centos1 openais[4438]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais event service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais message service B.01.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais configuration service'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Jul 31 01:34:40 centos1 openais[4438]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Jul 31 01:34:40 centos1 openais[4438]: [SYNC ] Not using a virtual synchrony filter.
Jul 31 01:35:10 centos1 openais[4438]: [TOTEM] The consensus timeout expired.
Jul 31 01:35:10 centos1 openais[4438]: [TOTEM] entering GATHER state from 3.
...

And additionally

Code:

[root@centos1 ssmping-0.9.1]# cman_tool join
cman_tool: Node is already active
[root@centos1 ssmping-0.9.1]# fence_tool join
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum

I really don't see, what I'm doing wrong, as I can also verify that multicast is working:

Code:

[root@centos1 ssmping-0.9.1]# ping -t 1 -c 2 224.0.0.1
PING 224.0.0.1 (224.0.0.1) 56(84) bytes of data.
64 bytes from centos1: icmp_seq=1 ttl=64 time=0.257 ms
64 bytes from centos2: icmp_seq=1 ttl=64 time=1.16 ms (DUP!)
...


lxf 08-02-2010 03:38 AM

Shit, I forgot that CentOS uses by default a firewall. Now everything is up and working.


All times are GMT -5. The time now is 05:13 AM.