LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 05-13-2009, 12:26 PM   #1
linuxexpress
LQ Newbie
 
Registered: Nov 2008
Posts: 22

Rep: Reputation: 15
DRBD and Heartbeat problem


I just install DRBD and Heartbeat service at 2 machines, server01 and server02 of which server01 is the primary node.

DRBD service tested OK and the drbd device can be mounted manually on both nodes. Whe I try to simulate the failure of node1 by shutting down the heartbeat service at server01, the heartbeat of server02 did not kick in and did not take over and mount the file system at server02.

I have no clue what went wrong. Any help is appreciated.

Details of what I done is as follow:

- both machines with drbd service starts at boot time
[root@server01] chkconfig drbd on
[root@server02] chkconfig drbd on
[root@server01] chkconfig heartbeat off
[root@server02] chkconfig heartbeat off

When both machines starts up
[root@server01~]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@server02~]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
**********************************************************************
- Then I start heartbeat manually for both machines:
At server01

[root@server01] service heartbeat start
[root@server01] tail f /var/log/messages

heartbeat[2851]: 2009/05/13_11:13:51 info: Version 2 support: false
heartbeat[2851]: 2009/05/13_11:13:51 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[2851]: 2009/05/13_11:13:51 info: **************************
heartbeat[2851]: 2009/05/13_11:13:51 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[2852]: 2009/05/13_11:13:51 info: heartbeat: version 2.1.3
heartbeat[2852]: 2009/05/13_11:13:51 info: Heartbeat generation: 1241706516
heartbeat[2852]: 2009/05/13_11:13:51 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat[2852]: 2009/05/13_11:13:51 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
heartbeat[2852]: 2009/05/13_11:13:51 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[2852]: 2009/05/13_11:13:51 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[2852]: 2009/05/13_11:13:51 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[2852]: 2009/05/13_11:13:51 info: Local status now set to: 'up'
heartbeat[2852]: 2009/05/13_11:14:52 WARN: node server02: is dead
heartbeat[2852]: 2009/05/13_11:14:52 info: Comm_now_up(): updating status to active
heartbeat[2852]: 2009/05/13_11:14:52 info: Local status now set to: 'active'
heartbeat[2852]: 2009/05/13_11:14:52 WARN: No STONITH device configured.
heartbeat[2852]: 2009/05/13_11:14:52 WARN: Shared disks are not protected.
heartbeat[2852]: 2009/05/13_11:14:52 info: Resources being acquired from server02
harc[2860]: 2009/05/13_11:14:52 info: Running /etc/ha.d/rc.d/status status
heartbeat[2861]: 2009/05/13_11:14:52 info: Local Resource acquisition completed.
heartbeat[2852]: 2009/05/13_11:14:52 info: Initial resource acquisition complete (T_RESOURCES(us))
mach_down[2919]: 2009/05/13_11:14:52 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[2919]: 2009/05/13_11:14:52 info: mach_down takeover complete for node server02
heartbeat[2852]: 2009/05/13_11:14:52 info: mach_down takeover complete.
harc[2953]: 2009/05/13_11:14:52 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[2953]: 2009/05/13_11:14:52 received ip-request-resp drbddisk::r0 OK yes
ResourceManager[2974]: 2009/05/13_11:14:52 info: Acquiring resource group: server01 drbddisk::r0 Filesystem::/dev/drbd0::/mnt/drbd::ext3 10.0.0.100 smb
ResourceManager[2974]: 2009/05/13_11:14:52 info: Running /etc/ha.d/resource.d/drbddisk r0 start
Filesystem[3050]: 2009/05/13_11:14:52 INFO: Resource is stopped
ResourceManager[2974]: 2009/05/13_11:14:52 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 start
Filesystem[3133]: 2009/05/13_11:14:52 INFO: Running start for /dev/drbd0 on /mnt/drbd
Filesystem[3122]: 2009/05/13_11:14:52 INFO: Success
IPaddr[3201]: 2009/05/13_11:14:52 INFO: Resource is stopped
ResourceManager[2974]: 2009/05/13_11:14:52 info: Running /etc/ha.d/resource.d/IPaddr 10.0.0.100 start
IPaddr[3277]: 2009/05/13_11:14:53 INFO: Using calculated nic for 10.0.0.100: eth0
IPaddr[3277]: 2009/05/13_11:14:53 INFO: Using calculated netmask for 10.0.0.100: 255.255.255.0
IPaddr[3277]: 2009/05/13_11:14:53 INFO: eval ifconfig eth0:0 10.0.0.100 netmask 255.255.255.0 broadcast 10.0.0.255
IPaddr[3260]: 2009/05/13_11:14:53 INFO: Success
ResourceManager[2974]: 2009/05/13_11:14:53 info: Running /etc/init.d/smb start
heartbeat[2852]: 2009/05/13_11:15:02 info: Local Resource acquisition completed. (none)
heartbeat[2852]: 2009/05/13_11:15:02 info: local resource transition completed
*********************************************************

At server02

[root@server02] service heartbeat start
[root@server02] tail f /var/log/messages

*May 13 11:14:28 server02logd: [2712]: info: logd started with default confi guration.
May 13 11:14:28 server02logd: [2713]: info: G_main_add_SignalHandler: Added signal handler for signal 15
May 13 11:14:28 server02logd: [2712]: info: G_main_add_SignalHandler: Added signal handler for signal 15
May 13 11:14:28 server02heartbeat: [2761]: info: Version 2 support: false
May 13 11:14:28 server02heartbeat: [2761]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
May 13 11:14:28 server02heartbeat: [2761]: info: **************************
May 13 11:14:28 server02heartbeat: [2761]: info: Configuration validated. Starting heartbeat 2.1.3
May 13 11:14:28 server02heartbeat: [2762]: info: heartbeat: version 2.1.3
May 13 11:14:28 server02heartbeat: [2762]: info: Heartbeat generation: 1241707673
May 13 11:14:28 server02heartbeat: [2762]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
May 13 11:14:28 server02heartbeat: [2762]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
May 13 11:14:28 server02heartbeat: [2762]: info: G_main_add_TriggerHandler: Added signal manual handler
May 13 11:14:28 server02heartbeat: [2762]: info: G_main_add_TriggerHandler: Added signal manual handler
May 13 11:14:28 server02heartbeat: [2762]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 13 11:14:28 server02heartbeat: [2762]: info: Local status now set to: 'up'
May 13 11:15:22 server02kernel: drbd0: peer( Secondary -> Primary )
May 13 11:15:29 server02heartbeat: [2762]: WARN: node server01: is dead
May 13 11:15:29 server02heartbeat: [2762]: info: Comm_now_up(): updating status to active
May 13 11:15:29 server02heartbeat: [2762]: info: Local status now set to: 'active'
May 13 11:15:29 server02heartbeat: [2762]: WARN: No STONITH device configured.
May 13 11:15:29 server02heartbeat: [2762]: WARN: Shared disks are not protected.
May 13 11:15:29 server02heartbeat: [2762]: info: Resources being acquired from server01
May 13 11:15:29 server02heartbeat: [2770]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys server02] to acquire.
May 13 11:15:29 server02heartbeat: [2762]: info: Initial resource acquisition complete (T_RESOURCES(us))
May 13 11:15:29 server02harc[2769]: info: Running /etc/ha.d/rc.d/status status
May 13 11:15:29 server02mach_down[2798]: info: Taking over resource group drbddisk::r0
May 13 11:15:29 server02ResourceManager[2824]: info: Acquiring resource group: server01 drbddisk::r0 Filesystem::/dev/drbd0::/mnt/drbd::ext3 10.0.0.100 smb
May 13 11:15:29 server02ResourceManager[2824]: info: Running /etc/ha.d/resource.d/drbddisk r0 start
May 13 11:15:40 server02heartbeat: [2762]: info: Local Resource acquisition completed. (none)
May 13 11:15:40 server02heartbeat: [2762]: info: local resource transition completed.
May 13 11:15:42 server02ResourceManager[2824]: ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
May 13 11:15:42 server02ResourceManager[2824]: CRIT: Giving up resources due to failure of drbddisk::r0
May 13 11:15:42 server02ResourceManager[2824]: info: Releasing resource group: server01 drbddisk::r0 Filesystem::/dev/drbd0::/mnt/drbd::ext3 10.0.0.100 smb
May 13 11:15:42 server02ResourceManager[2824]: info: Running /etc/init.d/smb stop
May 13 11:15:42 server02ResourceManager[2824]: info: Running /etc/ha.d/resource.d/IPaddr 10.0.0.100 stop
May 13 11:15:42 server02IPaddr[3018]: INFO: Success
May 13 11:15:42 server02ResourceManager[2824]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 stop
May 13 11:15:42 server02Filesystem[3089]: INFO: Running stop for /dev/drbd0 on /mnt/drbd
May 13 11:15:42 server02Filesystem[3078]: INFO: Success
May 13 11:15:42 server02ResourceManager[2824]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop
May 13 11:15:42 server02mach_down[2798]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
May 13 11:15:42 server02mach_down[2798]: info: mach_down takeover complete for node server01
May 13 11:15:42 server02heartbeat: [2762]: info: mach_down takeover complete.
May 13 11:16:12 server02hb_standby[3192]: Going standby [foreign].
May 13 11:16:12 server02heartbeat: [2762]: info: server02 wants to go standby [foreign]
May 13 11:16:22 server02heartbeat: [2762]: WARN: No reply to standby request. Standby request cancelled.
***************************************************

[root@server01]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.1 (api:88/proto:86-89)

m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /mnt/drbd ext3


[root@server02~]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.1 (api:88/proto:86-89)

m:res cs ro ds p mounted fstype
0:r0 Connected Secondary/Primary UpToDate/UpToDate C

************************************************

To simulate the failover,

At server 01

[roo@server01] service heartbeat stop

[root@server01] tail f /var/log/messages

May 13 11:25:50 server01heartbeat: [2852]: info: Heartbeat shutdown in progress. (2852)
May 13 11:25:50 server01heartbeat: [3512]: info: Giving up all HA resources.
May 13 11:25:50 server01ResourceManager[3525]: info: Releasing resource group: server01 drbddisk::r0 Filesystem::/dev/drbd0::/mnt/drbd::ext3 10.0.0.100 smb
May 13 11:25:50 server01ResourceManager[3525]: info: Running /etc/init.d/smb stop
May 13 11:25:50 server01ResourceManager[3525]: info: Running /etc/ha.d/resource.d/IPaddr 10.0.0.100 stop
May 13 11:25:50 server01IPaddr[3607]: INFO: ifconfig eth0:0 down
May 13 11:25:50 server01avahi-daemon[2364]: Withdrawing address record for 10.0.0.100 on eth0.
May 13 11:25:50 server01IPaddr[3590]: INFO: Success
May 13 11:25:50 server01ResourceManager[3525]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 stop
May 13 11:25:50 server01Filesystem[3669]: INFO: Running stop for /dev/drbd0 on /mnt/drbd
May 13 11:25:50 server01Filesystem[3669]: INFO: Trying to unmount /mnt/drbd
May 13 11:25:50 server01Filesystem[3669]: INFO: unmounted /mnt/drbd successfully
May 13 11:25:50 server01Filesystem[3658]: INFO: Success
May 13 11:25:50 server01ResourceManager[3525]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop
May 13 11:25:50 server01kernel: drbd0: role( Primary -> Secondary )
May 13 11:25:50 server01heartbeat: [3512]: info: All HA resources relinquished.
May 13 11:25:52 server01heartbeat: [2852]: info: killing HBFIFO process 2855 with signal 15
May 13 11:25:52 server01heartbeat: [2852]: info: killing HBWRITE process 2856 with signal 15
May 13 11:25:52 server01heartbeat: [2852]: info: killing HBREAD process 2857 with signal 15
May 13 11:25:52 server01heartbeat: [2852]: info: Core process 2855 exited. 3 remaining
May 13 11:25:52 server01heartbeat: [2852]: info: Core process 2856 exited. 2 remaining
May 13 11:25:52 server01heartbeat: [2852]: info: Core process 2857 exited. 1 remaining
May 13 11:25:52 server01heartbeat: [2852]: info: server01 Heartbeat shutdown complete.
May 13 11:25:53 server01logd: [3775]: info: Waiting for pid=2801 to exit
May 13 11:25:53 server01logd: [2810]: info: logd_term_write_action: received SIGTERM
May 13 11:25:53 server01logd: [2810]: info: Exiting write process
May 13 11:25:54 server01logd: [3775]: info: Pid 2801 exited

*******************************************************

There is no change in /var/log/messages in server02

[root@server01]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.1 (api:88/proto:86-89)

m:res cs ro ds p mounted fstype
0:r0 Connected Secondary/Secondary UpToDate/UpToDate C


[root@server02]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.1 (api:88/proto:86-89)

m:res cs ro ds p mounted fstype
0:r0 Connected Secondary/Secondary UpToDate/UpToDate C
 
Old 05-14-2009, 12:29 PM   #2
linuxexpress
LQ Newbie
 
Registered: Nov 2008
Posts: 22

Original Poster
Rep: Reputation: 15
I fix the problem by opening the port 694 as stated in the 'udpport' setting in the ha.cf
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DRBD and Heartbeat: can't configure haresources file hamish Linux - Server 0 09-07-2006 01:15 PM
drbd heartbeat saavik Linux - Networking 0 11-29-2005 10:56 AM
DRBD + heartbeat kridkom Linux - Software 2 04-13-2004 12:01 PM
drbd + heartbeat kridkom Linux - Networking 1 03-18-2004 02:12 AM
Can someone help me set up drbd and heartbeat? Gem Linux - Networking 5 03-18-2004 02:09 AM


All times are GMT -5. The time now is 04:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration