Folk,
I am working on deploying a 2 node cluster environment using the pacemaker. In that, I have created resource group having LVM, Filesystem, IPaddr2, apache. Once implemented, I see all the resource starts in the DS node (node1). However, when I put node-1 in standby, I see all the resource except apache starts at node2.
Below are my resources ,
Quote:
Group: apachegroup
Resource: my_lvm (class=ocf provider=heartbeat type=LVM)
Attributes: volgrpname=pradeep-vg exclusive=true
Operations: start interval=0s timeout=30 (my_lvm-start-interval-0s)
stop interval=0s timeout=30 (my_lvm-stop-interval-0s)
monitor interval=10 timeout=30 (my_lvm-monitor-interval-10)
Resource: my_fs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/pradeep-vg/pradeep-lv directory=/var/www fstype=ext4
Operations: start interval=0s timeout=60 (my_fs-start-interval-0s)
stop interval=0s timeout=60 (my_fs-stop-interval-0s)
monitor interval=20 timeout=40 (my_fs-monitor-interval-20)
Resource: Virtual-IP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.199.1.84 cidr_netmask=24
Operations: start interval=0s timeout=20s (Virtual-IP-start-interval-0s)
stop interval=0s timeout=20s (Virtual-IP-stop-interval-0s)
monitor interval=10s timeout=20s (Virtual-IP-monitor-interval-10s)
Resource: Website (class=ocf provider=heartbeat type=apache)
Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://127.0.0.1/server-status
Operations: monitor interval=10 timeout=20s (Website-monitor-interval-10)
start interval=0s timeout=240s (Website-start-interval-0s)
stop interval=0s timeout=300s (Website-stop-interval-0s)
|
Error message in the logs,
Quote:
Jun 28 03:19:06 node2.cluster.com pengine[1376]: warning: Processing failed op start for Website on node2.cluster.com: unknown error (1)
Jun 28 03:19:06 node2.cluster.com pengine[1376]: warning: Forcing Website away from node2.cluster.com after 1000000 failures (max=1000000)
|
Quote:
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node2.cluster.com (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum
Last updated: Wed Jun 28 07:23:42 2017 Last change: Tue Jun 27 23:03:36 2017 by root via crm_attribute on node1.cluster.com
2 nodes and 6 resources configured
Node node1.cluster.com: standby
Online: [ node2.cluster.com ]
Full list of resources:
fence-2 (stonith:fence_vmware_soap): Started node2.cluster.com
fence-1 (stonith:fence_vmware_soap): Started node2.cluster.com
Resource Group: apachegroup
my_lvm (ocf::heartbeat:LVM): Started node2.cluster.com
my_fs (ocf::heartbeat:Filesystem): Started node2.cluster.com
Virtual-IP (ocf::heartbeat:IPaddr2): Started node2.cluster.com
Website (ocf::heartbeat:apache): Stopped
Failed Actions:
* Website_start_0 on node2.cluster.com 'unknown error' (1): call=98, status=Timed Out, exitreason='none',
last-rc-change='Tue Jun 27 13:06:24 2017', queued=0ms, exec=40003ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
|
I have verified the configs which looks good to me. But since I am new to this, I am not sure what else to be checked for further troubleshooting. Could some shed light on this, please?