LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices



Reply
 
Search this Thread
Old 03-03-2012, 12:02 AM   #1
Khaosmaker
LQ Newbie
 
Registered: Mar 2012
Posts: 8

Rep: Reputation: Disabled
Pacemaker restart resources when node joins cluster after failback


Hi all,

I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and almost everything is working fine, I have only apache configured for testing, when a node goes down the failover is done correctly, but there's a problem when a node failbacks.

For example, let's say that Node1 has the lead on apache resource, then I reboot Node1, so Pacemaker detect it goes down, then apache is promoted to the Node2 and it keeps there running fine, that's fine, but when Node1 recovers and joins the cluster again, apache is restarted on Node2 again.

Anyone knows, why resources are restarted when a node rejoins a cluster ?

This is my pacemaker configuration:

node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \
attributes standby="off"
node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \
attributes standby="off"
primitive apache2 lsb:apache2 \
meta migration-threshold="1" failure-timeout="2" \
op monitor interval="5s" resource-stickiness="INFINITY"
primitive ip1 ocf:heartbeat:IPaddr2 \
params ip="192.168.1.38" nic="eth0:0"
primitive ip1arp ocf:heartbeat:SendArp \
params ip="192.168.1.38" nic="eth0:0"
group WebServices ip1 ip1arp apache2
location cli-prefer-WebServices WebServices \
rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2
colocation ip_with_arp inf: ip1 ip1arp
colocation web_with_ip inf: apache2 ip1
order arp_after_ip inf: ip1:start ip1arp:start
order web_after_ip inf: ip1arp:start apache2:start
property $id="cib-bootstrap-options" \
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
cluster-infrastructure="Heartbeat" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="INFINITY"


This is what I see on crm_mon:

1-. Node1 and Node1 OK:

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node1
ip1arp (ocf::heartbeat:SendArp): Started node1
apache2 (lsb:apache2): Started node1


2-. I reboot Node1 so Pacemaker promotes resources to Node2:

Online: [ node2 ]
OFFLINE: [node1]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Started node2
apache2 (lsb:apache2): Started node2


3-. Node1 is online again and join the cluster, resources still on Node2:

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Started node2
apache2 (lsb:apache2): Started node2

4-. But after some seconds, resources are stopped on Node2 and restarted again on the same Node2:

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Stopped
apache2 (lsb:apache2): Stopped


5-. Resources restarted and still on Node2

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Started node2
apache2 (lsb:apache2): Started node2



Why resources were restarted on Node2 when they where running fine?

Last edited by Khaosmaker; 03-03-2012 at 10:49 PM.
 
Old 04-19-2012, 12:56 PM   #2
www_linuxquestions_org
LQ Newbie
 
Registered: Apr 2012
Posts: 1

Rep: Reputation: Disabled
Fix for this issue

Try removing the ocf::heartbeat:SendArp configuration. That should fix it and you probably don't need the SendArp primitive.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Building some sort of cluster: slurm, pacemaker, cluster-glue or .... kaz2100 Linux - Software 2 07-21-2011 01:04 AM
[SOLVED] KSH script behaving differently on an HACMP cluster node (prod) & a single node (UAT) mufy Programming 5 01-03-2011 03:08 AM
How long a node failover and another node take over resources on HA cluster? levinhha Linux - Server 2 10-28-2010 10:13 PM
mysqld node of mysql cluster system not connecting to management node coal-fire-ice Linux - Server 0 05-07-2008 12:39 PM


All times are GMT -5. The time now is 03:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration