LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Apache resource not starting - Pacemaker Cluster (https://www.linuxquestions.org/questions/linux-newbie-8/apache-resource-not-starting-pacemaker-cluster-4175608776/)

pradeepspa 06-28-2017 06:23 AM

Apache resource not starting - Pacemaker Cluster
 
Folk,

I am working on deploying a 2 node cluster environment using the pacemaker. In that, I have created resource group having LVM, Filesystem, IPaddr2, apache. Once implemented, I see all the resource starts in the DS node (node1). However, when I put node-1 in standby, I see all the resource except apache starts at node2.

Below are my resources ,

Quote:

Group: apachegroup
Resource: my_lvm (class=ocf provider=heartbeat type=LVM)
Attributes: volgrpname=pradeep-vg exclusive=true
Operations: start interval=0s timeout=30 (my_lvm-start-interval-0s)
stop interval=0s timeout=30 (my_lvm-stop-interval-0s)
monitor interval=10 timeout=30 (my_lvm-monitor-interval-10)
Resource: my_fs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/pradeep-vg/pradeep-lv directory=/var/www fstype=ext4
Operations: start interval=0s timeout=60 (my_fs-start-interval-0s)
stop interval=0s timeout=60 (my_fs-stop-interval-0s)
monitor interval=20 timeout=40 (my_fs-monitor-interval-20)
Resource: Virtual-IP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.199.1.84 cidr_netmask=24
Operations: start interval=0s timeout=20s (Virtual-IP-start-interval-0s)
stop interval=0s timeout=20s (Virtual-IP-stop-interval-0s)
monitor interval=10s timeout=20s (Virtual-IP-monitor-interval-10s)
Resource: Website (class=ocf provider=heartbeat type=apache)
Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://127.0.0.1/server-status
Operations: monitor interval=10 timeout=20s (Website-monitor-interval-10)
start interval=0s timeout=240s (Website-start-interval-0s)
stop interval=0s timeout=300s (Website-stop-interval-0s)
Error message in the logs,

Quote:

Jun 28 03:19:06 node2.cluster.com pengine[1376]: warning: Processing failed op start for Website on node2.cluster.com: unknown error (1)
Jun 28 03:19:06 node2.cluster.com pengine[1376]: warning: Forcing Website away from node2.cluster.com after 1000000 failures (max=1000000)
Quote:

[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node2.cluster.com (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum
Last updated: Wed Jun 28 07:23:42 2017 Last change: Tue Jun 27 23:03:36 2017 by root via crm_attribute on node1.cluster.com

2 nodes and 6 resources configured

Node node1.cluster.com: standby
Online: [ node2.cluster.com ]

Full list of resources:

fence-2 (stonith:fence_vmware_soap): Started node2.cluster.com
fence-1 (stonith:fence_vmware_soap): Started node2.cluster.com
Resource Group: apachegroup
my_lvm (ocf::heartbeat:LVM): Started node2.cluster.com
my_fs (ocf::heartbeat:Filesystem): Started node2.cluster.com
Virtual-IP (ocf::heartbeat:IPaddr2): Started node2.cluster.com
Website (ocf::heartbeat:apache): Stopped

Failed Actions:
* Website_start_0 on node2.cluster.com 'unknown error' (1): call=98, status=Timed Out, exitreason='none',
last-rc-change='Tue Jun 27 13:06:24 2017', queued=0ms, exec=40003ms


Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled


I have verified the configs which looks good to me. But since I am new to this, I am not sure what else to be checked for further troubleshooting. Could some shed light on this, please?

AwesomeMachine 07-01-2017 07:40 PM

I noticed this has been here a while. The only thing I can think of is that when you put the one machine in standby, the services that were on the first machine, start on the other machine.


All times are GMT -5. The time now is 01:42 PM.