LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   cluster on slack14 (corosync, pacemaker) (http://www.linuxquestions.org/questions/slackware-14/cluster-on-slack14-corosync-pacemaker-4175477428/)

ciorny 09-17-2013 06:35 AM

cluster on slack14 (corosync, pacemaker)
 
Hi everyone,

From few days i'm trying to launch cluster based on corosync and pacemaker under Slackware64-14.0. My operation are based on followed website: http://ieatbinary.com/2012/05/14/cor...-in-slackware/

Compilation of packages finished successfully (scripts from slackbuilds.org). Launching rc.corosync too.
After corosync started, pacemaker hears other nodes on the subnet. After some time pacemaker try to update information in CIB and in this moment problems start. Changs to CIB are not possibe due to problem with IPC communication between processes started by corosync.

Code:

[..]
ep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: send_ipc_message: IPC Channel to 2058 is not connected
Sep 16 07:52:03 exdns-n1 attrd: [2060]: info: cib_native_msgready: Lost connection to the CIB service [2058].
Sep 16 07:52:03 corosync [pcmk  ] info: pcmk_ipc_exit: Client cib (conn=0x15b04e0, async-conn=0x15b04e0) left
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: cib_native_perform_op: Sending message to CIB service FAILED
Sep 16 07:52:03 exdns-n1 attrd: [2060]: CRIT: cib_native_dispatch: Lost connection to the CIB service [2058/callback].
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: crm_element_value: Couldn't find ignore_dtd in NULL
Sep 16 07:52:03 exdns-n1 attrd: [2060]: CRIT: cib_native_dispatch: Lost connection to the CIB service [2058/command].
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: crm_element_value: Couldn't find validate-with in NULL
Sep 16 07:52:03 exdns-n1 attrd: [2060]: ERROR: attrd_cib_connection_destroy: Connection to the CIB terminated...
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: send_ipc_message: IPC Channel to 2058 is not connected
Sep 16 07:52:03 corosync [pcmk  ] info: pcmk_ipc_exit: Client attrd (conn=0x15af680, async-conn=0x15af680) left
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: cib_native_perform_op: Sending message to CIB service FAILED
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: update_attr: Error setting dc-version=1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e (section=crm_config, set=cib-bootstrap-options): send failed
Sep 16 07:52:03 exdns-n1 crmd: [2062]: info: log_data_element: update_attr: Update <crm_config >
Sep 16 07:52:03 exdns-n1 crmd: [2062]: info: log_data_element: update_attr: Update  <cluster_property_set id="cib-bootstrap-options" >
Sep 16 07:52:03 exdns-n1 crmd: [2062]: info: log_data_element: update_attr: Update    <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e" />
Sep 16 07:52:03 exdns-n1 crmd: [2062]: info: log_data_element: update_attr: Update  </cluster_property_set>
Sep 16 07:52:03 exdns-n1 crmd: [2062]: info: log_data_element: update_attr: Update </crm_config>
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: send_ipc_message: IPC Channel to 2058 is not connected
Sep 16 07:52:03 exdns-n1 crmd: [2062]: ERROR: cib_native_perform_op: Sending message to CIB service FAILED
[..]

status of cluster is with no DC set
Code:

root@exdns-n1:/var/log# crm status
============
Last updated: Tue Sep 17 11:24:40 2013
Current DC: NONE
0 Nodes configured, unknown expected votes
0 Resources configured.
============

Manual changes to CIB are not possible too
Code:

root@exdns-n1:/var/log# crm
crm(live)# configure
INFO: building help index
crm(live)configure# property stonith-enabled=false
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit
Call cib_replace failed (-41): Remote node did not respond
<null>
ERROR: could not replace cib
INFO: offending xml: <configuration>
        <crm_config>
                <cluster_property_set id="cib-bootstrap-options">
                        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
                        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
                </cluster_property_set>
        </crm_config>
        <nodes/>
        <resources/>
        <constraints/>
        <op_defaults/>
        <rsc_defaults/>
</configuration>

crm(live)configure# quit

The same behavior i've observed on slackware64-14.0, slackware-14.0 and slackware-13.37. In all cases fresh installation.

besides packages/scripts from SBo i've tried other versions of packages withe the same result - IPC problem.

any ideas/suggestions what can be a reason of aboe problem?

ths & regards,
ck

sardinha 09-18-2013 06:27 AM

It isn't exactly to reach the same goal and it's a little out to date, but just for a reference to compare and comment:
http://blog.tpa.me.uk/high-availabil...rbd-pacemaker/

ciorny 09-19-2013 02:17 AM

Thakns for link. I know that website.

I solved problem with IPC by installing newest versions of packages as follow:

clusterglue: 1.0.11 - doesn't need patch included in scripts from SBo
clusterresourceagents: 3.9.5
corosync: 1.4.6
openais: 1.1.4 (in 2.x tree of corosync, openais is integrated with this one)
libqb: 0.16 - needed by pacemaker 1.1.7
pacemaker: 1.1.7 - the last release with 'crm' included into tarball (from 1.1.8 'crm' is a seprate project)

After upgrading and installing above packages, the whole clusterHA setup on Slackware64-14.0 operates with no problems.

regards,
ciorny


All times are GMT -5. The time now is 08:25 PM.