Trouble setting up a 2-server cluster (aisexec daemon didn't start)
Hello,
First of all i'm hoping this forum is the right place for this topic.
I'm trying to set up a 2-server cluster, but openais doesn't seem to want to start up (from cman). This is the logs i get, and the setup i have done...
Syslog:
Oct 14 13:02:21 korzel ccsd[7823]: Starting ccsd 2.0.115:
Oct 14 13:02:21 korzel ccsd[7823]: Built: Sep 3 2009 23:26:21
Oct 14 13:02:21 korzel ccsd[7823]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Oct 14 13:02:21 korzel ccsd[7823]: cluster.conf (cluster name = cl_vpanel, version = 1) found.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] AIS Executive Service RELEASE 'subrev 1358 version 0.80.3'
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] AIS Executive Service: started and ready to provide service.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Using default multicast address of 239.192.204.67
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] openais component openais_cpg loaded.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01'
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] openais component openais_cfg loaded.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Registering service handler 'openais configuration service'
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] openais component openais_msg loaded.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Registering service handler 'openais message service B.01.01'
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] openais component openais_lck loaded.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01'
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] openais component openais_evt loaded.
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] Registering service handler 'openais event service B.01.01'
Oct 14 13:02:24 korzel openais[7830]: [MAIN ] openais component openais_ckpt loaded.
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01'
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] openais component openais_amf loaded.
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] Registering service handler 'openais availability management framework B.01.01'
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] openais component openais_clm loaded.
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01'
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] openais component openais_evs loaded.
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] Registering service handler 'openais extended virtual synchrony service'
Oct 14 13:02:25 korzel openais[7830]: [MAIN ] openais component openais_cman loaded.
Oct 14 13:02:50 korzel ccsd[7823]: Unable to connect to cluster infrastructure after 30 seconds.
Oct 14 13:03:21 korzel ccsd[7823]: Unable to connect to cluster infrastructure after 60 seconds.
When trying to start cman by hand i get the following message in console:
[root@korzel ~]# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed
/usr/sbin/cman_tool: aisexec daemon didn't start
[FAILED]
I've googled just about anything i could find with this error, but none of the fixes described there seem to fit my cause. Additional information:
openais: openais-0.80.3-22.4 (this is a version i build from source rpm. It's the 3rd or so version i installed, all give the same error)
My cluster.conf file:
<cluster alias="cl_vpanel" config_version="1" name="cl_vpanel">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="kwast.int.traserv.com" nodeid="1" votes="1"/>
<clusternode name="korzel.int.traserv.com" nodeid="2" votes="1"/>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm/>
</cluster>
That one is pretty basic now. I've had it set up with the proper dell_drac fencing devices etc, that doesn't seem to make a difference.
At this point the cluster is "build" with luci/ricci, i've also done the same with the manual tools. All give the same result.
The host file on both servers:
[root@korzel ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
217.119.229.214 korzel.traserv.com korzel
217.119.229.215 kwast.traserv.com kwast
192.168.1.147 korzel.int.traserv.com
192.168.1.148 kwast.int.traserv.com
Tried with the "basic" hosts file as well, eg with only the localhost lines in it.
Both server run centos 5.3, up-to-date.
Anyone any clue as to what might cause this? There is no real error anywhere which might explain WHY aisexec daemon won't start. When i start openais manually (/etc/init.d/openais start) it starts without any errors. And openais is set to NOT start at boot.
I am totally out of options at this point, and any help would be appreciated.
|