LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Closed Thread
  Search this Thread
Old 03-31-2010, 10:53 AM   #1
slinx
Member
 
Registered: Apr 2008
Location: Cleveland, Ohio
Distribution: SuSE, CentOS, Fedora, Ubuntu
Posts: 106

Rep: Reputation: 23
Heartbeat cluster won't recognize other node, resource won't start.


Hello,

I followed the directions in this HowTo to a "T" ...except for some modifications for my environment, but it's not quite working.

DRBD itself is working, but I can't get heartbeat to control it. I seem to have an error in my resource definitions. I'm just not finding the documentation very clear on what I need to change.

I want the node to mount the drbd device, then assign the virtual IP, then start mysql and apache. And I want the active node to STONITH the other node if it fails to get a heartbeat.

Here's crm_mon output:
Code:
admin-lab0 ~]$ sudo /usr/sbin/crm_mon 
============
Last updated: Wed Mar 31 11:30:56 2010
Current DC: admin-lab0 (de820ffb-dab9-446c-ab5b-9291e5409a69)
2 Nodes configured.
1 Resources configured.
============

Node: admin-lab1 (c07cf70b-865c-41fb-98f7-9a25163c0825): OFFLINE
Node: admin-lab0 (de820ffb-dab9-446c-ab5b-9291e5409a69): online


Failed actions:
    drbddisk_mysql_start_0 (node=admin-lab0, call=6, rc=1): Error
it appears each node can only see itself. I can ping each node through the general interface, and through eth1, which is dedicated to drbd:

Code:
admin-lab1 ~]$ sudo /usr/sbin/crm_mon
============
Last updated: Wed Mar 31 09:42:17 2010
Current DC: admin-lab1 (c07cf70b-865c-41fb-98f7-9a25163c0825)
2 Nodes configured.
1 Resources configured.
============

Node: admin-lab1 (c07cf70b-865c-41fb-98f7-9a25163c0825): online
Node: admin-lab0 (de820ffb-dab9-446c-ab5b-9291e5409a69): OFFLINE


Failed actions:
    drbddisk_mysql_start_0 (node=admin-lab1, call=6, rc=1): Error
I'm getting this error:

Code:
admin-lab0$ sudo /usr/sbin/crm_verify -L -VVV

crm_verify[13157]: 2010/03/31_10:58:16 info: main: =#=#=#=#= Getting XML =#=#=#=#=
crm_verify[13157]: 2010/03/31_10:58:16 info: main: Reading XML from: live cluster
crm_verify[13157]: 2010/03/31_10:58:16 notice: main: Required feature set: 2.0
crm_verify[13157]: 2010/03/31_10:58:16 info: determine_online_status: Node admin-lab0 is online
crm_verify[13157]: 2010/03/31_10:58:16 WARN: unpack_rsc_op: Processing failed op drbddisk_mysql_start_0 on admin-lab0: Error
crm_verify[13157]: 2010/03/31_10:58:16 WARN: unpack_rsc_op: Compatability handling for failed op drbddisk_mysql_start_0 on admin-lab0
crm_verify[13157]: 2010/03/31_10:58:16 notice: group_print: Resource Group: rg_mysql
crm_verify[13157]: 2010/03/31_10:58:16 notice: native_print:     drbddisk_mysql (heartbeat:drbddisk):   Stopped 
crm_verify[13157]: 2010/03/31_10:58:16 notice: native_print:     fs_mysql       (heartbeat::ocf:Filesystem):    Stopped 
crm_verify[13157]: 2010/03/31_10:58:16 notice: native_print:     ip_mysql       (heartbeat::ocf:IPaddr2):       Stopped 
crm_verify[13157]: 2010/03/31_10:58:16 notice: native_print:     mysqld (lsb:mysqld):   Stopped 
crm_verify[13157]: 2010/03/31_10:58:16 WARN: native_color: Resource drbddisk_mysql cannot run anywhere
crm_verify[13157]: 2010/03/31_10:58:16 WARN: native_color: Resource fs_mysql cannot run anywhere
crm_verify[13157]: 2010/03/31_10:58:16 WARN: native_color: Resource ip_mysql cannot run anywhere
crm_verify[13157]: 2010/03/31_10:58:16 WARN: native_color: Resource mysqld cannot run anywhere
Warnings found during check: config may not be valid
Additional debug output shows:

Code:
crm_verify[13156]: 2010/03/31_10:58:05 WARN: unpack_rsc_op: Processing failed op drbddisk_mysql_start_0 on admin-lab0: Error
crm_verify[13156]: 2010/03/31_10:58:05 WARN: unpack_rsc_op: Compatability handling for failed op drbddisk_mysql_start_0 on admin-lab0
crm_verify[13156]: 2010/03/31_10:58:05 notice: group_print: Resource Group: rg_mysql
crm_verify[13156]: 2010/03/31_10:58:05 notice: native_print:     drbddisk_mysql (heartbeat:drbddisk):   Stopped
crm_verify[13156]: 2010/03/31_10:58:05 notice: native_print:     fs_mysql       (heartbeat::ocf:Filesystem):    Stopped
crm_verify[13156]: 2010/03/31_10:58:05 notice: native_print:     ip_mysql       (heartbeat::ocf:IPaddr2):       Stopped
crm_verify[13156]: 2010/03/31_10:58:05 notice: native_print:     mysqld (lsb:mysqld):   Stopped
crm_verify[13156]: 2010/03/31_10:58:05 debug: native_print: Allocating: drbddisk_mysql  (heartbeat:drbddisk):   Stopped
crm_verify[13156]: 2010/03/31_10:58:05 debug: native_assign_node: Color drbddisk_mysql, Node[0] admin-lab1: 0
crm_verify[13156]: 2010/03/31_10:58:05 debug: native_assign_node: Color drbddisk_mysql, Node[1] admin-lab0: -1000000
crm_verify[13156]: 2010/03/31_10:58:05 debug: native_assign_node: All nodes for resource drbddisk_mysql are unavailable, unclean or shutting down
crm_verify[13156]: 2010/03/31_10:58:05 WARN: native_color: Resource drbddisk_mysql cannot run anywhere
Plus, It's not colocating the resources as I want:

Code:
crm_verify[13156]: 2010/03/31_10:58:05 debug: unpack_config: Default action timeout: 20s
crm_verify[13156]: 2010/03/31_10:58:05 debug: unpack_config: Default stickiness: 0
crm_verify[13156]: 2010/03/31_10:58:05 debug: unpack_config: Default failure stickiness: 0
crm_verify[13156]: 2010/03/31_10:58:05 debug: unpack_config: STONITH of failed nodes is disabled
crm_verify[13156]: 2010/03/31_10:58:05 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
crm_verify[13156]: 2010/03/31_10:58:05 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
Then I have this error in /var/log/messages, so I know I have something wrong in my configuration:
Code:
Mar 30 23:55:42 admin-lab0 lrmd: [11898]: info: rsc:drbddisk_mysql: start
Mar 30 23:55:42 admin-lab0 lrmd: [11898]: info: RA output: (drbddisk_mysql:start:stderr) 'mysql' not defined in your config. 
Mar 30 23:55:47 admin-lab0 crmd: [11901]: ERROR: process_lrm_event: LRM operation drbddisk_mysql_start_0 (call=6, rc=1) Error unknown error
Mar 30 23:55:47 admin-lab0 crmd: [11901]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
Mar 30 23:55:47 admin-lab0 crmd: [11901]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Mar 30 23:55:47 admin-lab0 crmd: [11901]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
Mar 30 23:55:47 admin-lab0 crmd: [11901]: info: do_lrm_rsc_op: Performing op=drbddisk_mysql_stop_0 key=1:1:c79820db-a2ad-4d41-8bcc-f3621d5a3414)
Mar 30 23:55:47 admin-lab0 lrmd: [11898]: info: rsc:drbddisk_mysql: stop
Mar 30 23:55:47 admin-lab0 lrmd: [11898]: info: RA output: (drbddisk_mysql:stop:stderr) 'mysql' not defined in your config. 
Mar 30 23:55:47 admin-lab0 lrmd: [11898]: info: RA output: (drbddisk_mysql:stop:stderr) /sbin/drbdadm secondary mysql: exit code 3, mapping to 0 
Mar 30 23:55:47 admin-lab0 crmd: [11901]: info: process_lrm_event: LRM operation drbddisk_mysql_stop_0 (call=7, rc=0) complete 
Mar 30 23:55:47 admin-lab0 tengine: [11907]: info: notify_crmd: Transition 1 status: te_complete - <null>
Mar 30 23:55:47 admin-lab0 crmd: [11901]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
What I am trying to figure out is where should I define resources? In /etc/ha.d/ha.cf ? /etc/ha.d/haresources? in the cib.xml file with cibadmin? I'm stumped and the documentation is not very clear with any of them.

Here are my configs:

Linux admin-lab0 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:37:14 EDT 2010 i686 i686 i386 GNU/Linux
CentOS release 5.4 (Final)

drbd-8.3.7-1
drbd-bash-completion-8.3.7-1
drbd-heartbeat-8.3.7-1
drbd-km-2.6.18_164.15.1.el5-8.3.7-12
drbd-pacemaker-8.3.7-1
drbd-udev-8.3.7-1
drbd-utils-8.3.7-1
drbd-xen-8.3.7-1
heartbeat-2.1.3-3.el5.centos
heartbeat-pils-2.1.3-3.el5.centos
heartbeat-stonith-2.1.3-3.el5.centos

/etc/ha.d/ha.cf
Code:
keepalive       2
deadtime        30
warntime        10
initdead        120
bcast eth1
ucast eth0 10.98.4.90
ucast eth0 10.98.4.91
node            admin-lab0
node            admin-lab1
keepalive       2
stonith_host external/ipmi admin-lab0 10.98.5.76 root -----
stonith_host external/ipmi admin-lab1 10.98.6.224 root -----
crm             respawn
/etc/ha.d/haresources
Code:
admin-lab3 IPaddr::10.98.4.93/16/eth0 http mysql
Code:
admin-lab0 ~]$ sudo /usr/sbin/cibadmin -Q
 <cib generated="true" admin_epoch="0" epoch="3" num_updates="19" have_quorum="true" ignore_dtd="false" num_peers="1" cib_feature_revision="2.0" cib-last-written="Tue Mar 30 18:30:39 2010" ccm_transition="1" dc_uuid="de820ffb-dab9-446c-ab5b-9291e5409a69">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="c07cf70b-865c-41fb-98f7-9a25163c0825" uname="admin-lab1" type="normal"/>
       <node id="de820ffb-dab9-446c-ab5b-9291e5409a69" uname="admin-lab0" type="normal"/>
     </nodes>
     <resources>
       <group ordered="true" collocated="true" id="rg_mysql">
         <primitive class="heartbeat" type="drbddisk" provider="heartbeat" id="drbddisk_mysql">
           <meta_attributes id="7aa6d6e9-2ddc-4ea9-8298-0884e3e6f53f">
             <attributes>
               <nvpair name="target_role" value="started" id="29c914c0-42d3-47d1-be82-0349fdd8029a"/>
             </attributes>
           </meta_attributes>
           <instance_attributes id="69a45069-a2e2-4267-a8bd-a434b96c463d">
             <attributes>
               <nvpair name="1" value="mysql" id="fe2dd16a-b4bd-400f-8877-ec8002aa4333"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="ocf" type="Filesystem" provider="heartbeat" id="fs_mysql">
           <instance_attributes id="41e504aa-1452-4038-830a-edf0db211880">
             <attributes>
               <nvpair name="device" value="/dev/drbd0" id="88156574-9ab5-4806-b936-8d517abcfa8a"/>
               <nvpair name="directory" value="/var/lib/mysql" id="1941ba57-44f3-4791-9a99-474bd173ec25"/>
               <nvpair name="type" value="ext3" id="b2caf7bb-ab70-459c-9e91-7ecf2b64221c"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="ocf" type="IPaddr2" provider="heartbeat" id="ip_mysql">
           <instance_attributes id="4cffc9d1-ab51-45e4-a98f-4d3edf31dd2d">
             <attributes>
               <nvpair name="ip" value="10.98.4.93" id="b12c1d30-5164-4245-a426-a0fe3d14dd86"/>
               <nvpair name="cidr_netmask" value="16" id="83129818-9279-440b-97b8-d23bf72a8832"/>
               <nvpair name="nic" value="eth0" id="b3ca5ef5-e2a0-4f1f-b7cd-4f2c178552b3"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="lsb" type="mysqld" provider="heartbeat" id="mysqld"/>
       </group>
     </resources>
     <constraints/>
   </configuration>
   <status>
     <node_state id="de820ffb-dab9-446c-ab5b-9291e5409a69" uname="admin-lab0" crmd="online" crm-debug-origin="do_update_resource" shutdown="0" in_ccm="true" ha="active" join="member" expected="member">
       <lrm id="de820ffb-dab9-446c-ab5b-9291e5409a69">
         <lrm_resources>
           <lrm_resource id="drbddisk_mysql" type="drbddisk" class="heartbeat" provider="heartbeat">
             <lrm_rsc_op id="drbddisk_mysql_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" transition_key="3:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" transition_magic="0:7;3:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" call_id="2" crm_feature_set="2.0" rc_code="7" op_status="0" interval="0" op_digest="335708e636e88faff6fd969f5e0be283"/>
             <lrm_rsc_op id="drbddisk_mysql_start_0" operation="start" crm-debug-origin="do_update_resource" transition_key="8:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" transition_magic="4:1;8:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" call_id="6" crm_feature_set="2.0" rc_code="1" op_status="4" interval="0" op_digest="335708e636e88faff6fd969f5e0be283"/>
             <lrm_rsc_op id="drbddisk_mysql_stop_0" operation="stop" crm-debug-origin="do_update_resource" transition_key="1:1:c79820db-a2ad-4d41-8bcc-f3621d5a3414" transition_magic="0:0;1:1:c79820db-a2ad-4d41-8bcc-f3621d5a3414" call_id="7" crm_feature_set="2.0" rc_code="0" op_status="0" interval="0" op_digest="335708e636e88faff6fd969f5e0be283"/>
           </lrm_resource>
           <lrm_resource id="fs_mysql" type="Filesystem" class="ocf" provider="heartbeat">
             <lrm_rsc_op id="fs_mysql_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" transition_key="4:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" transition_magic="0:7;4:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" call_id="3" crm_feature_set="2.0" rc_code="7" op_status="0" interval="0" op_digest="a11cf3a35e6400332669268471abdea5"/>
           </lrm_resource>
           <lrm_resource id="ip_mysql" type="IPaddr2" class="ocf" provider="heartbeat">
             <lrm_rsc_op id="ip_mysql_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" transition_key="5:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" transition_magic="0:7;5:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" call_id="4" crm_feature_set="2.0" rc_code="7" op_status="0" interval="0" op_digest="4cc1f203e540e0dc8fc723f94e4d4a17"/>
           </lrm_resource>
           <lrm_resource id="mysqld" type="mysqld" class="lsb" provider="heartbeat">
             <lrm_rsc_op id="mysqld_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" transition_key="6:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" transition_magic="0:7;6:0:c79820db-a2ad-4d41-8bcc-f3621d5a3414" call_id="5" crm_feature_set="2.0" rc_code="7" op_status="0" interval="0" op_digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
           </lrm_resource>
         </lrm_resources>
       </lrm>
       <transient_attributes id="de820ffb-dab9-446c-ab5b-9291e5409a69">
         <instance_attributes id="status-de820ffb-dab9-446c-ab5b-9291e5409a69">
           <attributes>
             <nvpair id="status-de820ffb-dab9-446c-ab5b-9291e5409a69-probe_complete" name="probe_complete" value="true"/>
             <nvpair id="status-de820ffb-dab9-446c-ab5b-9291e5409a69-fail-count-drbddisk_mysql" name="fail-count-drbddisk_mysql" value="1"/>
           </attributes>
         </instance_attributes>
       </transient_attributes>
     </node_state>
   </status>
 </cib>
I'm sure I could figure it out if I knew where to go - can someone please help me identify which configuration I need to adjust ? It also looks like stonith is not enabled - how do I do that? The documentation for these tools is just terrible, full of typos and errors.

OK, and it looks like the node is trying to talk to the other one, but not getting a connection:

Code:
Source                                   Destination                              Proto   State        TTL    
10.98.2.20:59155                         10.98.4.91:22                            tcp     ESTABLISHED  119:59:59
10.98.4.91:60001                         10.98.4.91:694                           udp                    0:00:29
10.98.4.91:37433                         10.98.4.90:694                           udp                    0:00:29
192.168.0.1:36230                        192.168.0.2:7788                         tcp     ESTABLISHED  119:59:52
192.168.0.2:40063                        192.168.0.1:7788                         tcp     ESTABLISHED  107:27:23
192.168.0.2:36522                        192.168.0.3:694                          udp                    0:00:29
Ahh... ok I added a rule to allow udp traffic, let's see if that helps...
also saw this when I reloaded the config, so I guess I don't need haresources
Code:
heartbeat[9295]: 2010/03/31_11:46:30 WARN: File /etc/ha.d/haresources exists.
heartbeat[9295]: 2010/03/31_11:46:30 WARN: This file is not used because crm is enabled
Nope, still doesn't work:

Code:
                                               IPTables - State Top
Version: 1.4          Sort: SrcIP           s to change sorting
Source                                   Destination                              Proto   State        TTL    
10.98.2.20:41085                         10.98.4.90:22                            tcp     ESTABLISHED  119:59:59
10.98.4.90:58189                         10.98.4.91:694                           udp                    0:00:28
10.98.4.90:43626                         10.98.4.90:694                           udp                    0:00:28
10.98.4.91:37433                         10.98.4.90:694                           udp                    0:00:28
192.168.0.1:36230                        192.168.0.2:7788                         tcp     ESTABLISHED  119:59:51
192.168.0.1:52776                        192.168.0.3:694                          udp                    0:00:28
192.168.0.2:40063                        192.168.0.1:7788                         tcp     ESTABLISHED  107:09:01
192.168.0.2:36522                        192.168.0.3:694                          udp                    0:00:28

Last edited by slinx; 03-31-2010 at 10:58 AM.
 
Old 03-31-2010, 11:19 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Please post your thread once and in only one forum. Posting a single thread in the most relevant forum will make it easier for members to help you and will keep the discussion in one place. This thread is being closed because it is a duplicate of http://www.linuxquestions.org/questi...-start-799140/.

- Making a first reply in your own thread removes the 0-reply status.
- Don't forget you can use the EDIT button to edit in details in your original post.
- There are no valid reasons for crossposting. Instead use the REPORT to ask a moderator to merge posts or threads or move your thread to a more appropriate forum (where applicable).
 
  


Closed Thread

Tags
cluster, clustering, crm, heartbeat



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mysqld node of mysql cluster system not connecting to management node coal-fire-ice Linux - Server 1 07-27-2015 08:33 AM
Heartbeat cluster won't recognize other node, resource won't start. slinx Linux - Enterprise 2 04-08-2010 10:48 PM
Home made cluster node fails to start sometimes on account of nfs Pier Linux - Server 2 03-24-2010 05:13 AM
How to make a DRBD node start itself as a primary node automatically? pyruby Linux - Newbie 1 01-29-2010 12:41 PM
Heartbeat - secondary node takes over for a short time only! pyruby Linux - Newbie 3 11-26-2009 03:07 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 02:36 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration