I did a little advancement.
Instead of using httpd as the name of the service in haresources, I set it to apache2, which is the name of the script in /etc/init.d. Also, in that file I get to write the virtual address to be used by heartbeat, so I was wrong in my first post when i said that I didn't set the virtual address anywhere.
Having made that correction, I still can't get apache to run.
I have node2 active, node1 down (shut down heartbeat service). apache is not running in node2. When I start heartbeat on node1, here's the output of heartbeat's logs:
node1:
Code:
heartbeat[6853]: 2009/08/24_09:37:22 WARN: Core dumps could be lost if multiple dumps occur.
heartbeat[6853]: 2009/08/24_09:37:22 WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
heartbeat[6853]: 2009/08/24_09:37:22 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
heartbeat[6853]: 2009/08/24_09:37:22 info: Version 2 support: false
heartbeat[6853]: 2009/08/24_09:37:22 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[6853]: 2009/08/24_09:37:22 info: **************************
heartbeat[6853]: 2009/08/24_09:37:22 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[6854]: 2009/08/24_09:37:23 info: heartbeat: version 2.1.3
heartbeat[6854]: 2009/08/24_09:37:23 info: Heartbeat generation: 1250979590
heartbeat[6854]: 2009/08/24_09:37:23 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat[6854]: 2009/08/24_09:37:23 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
heartbeat[6854]: 2009/08/24_09:37:23 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[6854]: 2009/08/24_09:37:23 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[6854]: 2009/08/24_09:37:23 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[6854]: 2009/08/24_09:37:23 info: Local status now set to: 'up'
heartbeat[6854]: 2009/08/24_09:37:24 info: Link apache2:eth0 up.
heartbeat[6854]: 2009/08/24_09:37:24 info: Status update for node apache2: status active
heartbeat[6854]: 2009/08/24_09:37:24 info: Link apache1:eth0 up.
heartbeat[6854]: 2009/08/24_09:37:25 info: Comm_now_up(): updating status to active
heartbeat[6854]: 2009/08/24_09:37:25 info: Local status now set to: 'active'
heartbeat[6854]: 2009/08/24_09:37:25 WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 80 ms (> 50 ms) (GSource: 0x9e0cdd0)
heartbeat[6854]: 2009/08/24_09:37:25 WARN: standby message [other] from apache2 ignored. Other side is in flux.
heartbeat[6854]: 2009/08/24_09:37:25 info: remote resource transition completed.
heartbeat[6854]: 2009/08/24_09:37:25 info: remote resource transition completed.
heartbeat[6854]: 2009/08/24_09:37:25 info: Local Resource acquisition completed. (none)
harc[6861]: 2009/08/24_09:37:26 info: Running /etc/ha.d/rc.d/status status
On node2:
Code:
heartbeat[2585]: 2009/08/24_09:37:25 info: Heartbeat restart on node apache1
heartbeat[2585]: 2009/08/24_09:37:25 info: Link apache1:eth0 up.
heartbeat[2585]: 2009/08/24_09:37:25 info: Status update for node apache1: status init
heartbeat[2585]: 2009/08/24_09:37:25 info: Status update for node apache1: status up
heartbeat[2585]: 2009/08/24_09:37:25 info: apache1 wants to go standby [foreign]
heartbeat[2585]: 2009/08/24_09:37:26 info: Status update for node apache1: status active
heartbeat[2585]: 2009/08/24_09:37:27 info: remote resource transition completed.
harc[3749]: 2009/08/24_09:37:27 info: Running /etc/ha.d/rc.d/status status
harc[3766]: 2009/08/24_09:37:28 info: Running /etc/ha.d/rc.d/status status
harc[3782]: 2009/08/24_09:37:29 info: Running /etc/ha.d/rc.d/status status
There was no mention of network configuration on either host.
Now, when I do a
restart of heartbeat on node1, see the output of node1's log:
Code:
heartbeat[6854]: 2009/08/24_09:40:02 info: Heartbeat shutdown in progress. (6854)
heartbeat[6893]: 2009/08/24_09:40:02 info: Giving up all HA resources.
heartbeat[6854]: 2009/08/24_09:40:04 WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 70 ms (> 50 ms) (GSource: 0x9e10948)
ResourceManager[6907]: 2009/08/24_09:40:04 info: Releasing resource group: apache1 192.168.200.103 apache2
ResourceManager[6907]: 2009/08/24_09:40:05 info: Running /etc/init.d/apache2 stop
ResourceManager[6907]: 2009/08/24_09:40:09 info: Running /etc/ha.d/resource.d/IPaddr 192.168.200.103 stop
IPaddr[6980]: 2009/08/24_09:40:11 INFO: Success
heartbeat[6893]: 2009/08/24_09:40:11 info: All HA resources relinquished.
heartbeat[6854]: 2009/08/24_09:40:13 info: killing HBFIFO process 6857 with signal 15
heartbeat[6854]: 2009/08/24_09:40:13 info: killing HBWRITE process 6858 with signal 15
heartbeat[6854]: 2009/08/24_09:40:14 info: killing HBREAD process 6859 with signal 15
heartbeat[6854]: 2009/08/24_09:40:14 info: Core process 6857 exited. 3 remaining
heartbeat[6854]: 2009/08/24_09:40:14 info: Core process 6858 exited. 2 remaining
heartbeat[6854]: 2009/08/24_09:40:14 info: Core process 6859 exited. 1 remaining
heartbeat[6854]: 2009/08/24_09:40:14 info: apache1 Heartbeat shutdown complete.
heartbeat[7099]: 2009/08/24_09:40:59 WARN: Core dumps could be lost if multiple dumps occur.
heartbeat[7099]: 2009/08/24_09:40:59 WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
heartbeat[7099]: 2009/08/24_09:40:59 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
heartbeat[7099]: 2009/08/24_09:40:59 info: Version 2 support: false
heartbeat[7099]: 2009/08/24_09:40:59 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[7099]: 2009/08/24_09:40:59 info: **************************
heartbeat[7099]: 2009/08/24_09:40:59 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[7100]: 2009/08/24_09:40:59 info: heartbeat: version 2.1.3
heartbeat[7100]: 2009/08/24_09:40:59 info: Heartbeat generation: 1250979591
heartbeat[7100]: 2009/08/24_09:40:59 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat[7100]: 2009/08/24_09:40:59 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
heartbeat[7100]: 2009/08/24_09:40:59 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[7100]: 2009/08/24_09:40:59 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[7100]: 2009/08/24_09:40:59 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[7100]: 2009/08/24_09:41:00 info: Local status now set to: 'up'
heartbeat[7100]: 2009/08/24_09:41:01 info: Link apache2:eth0 up.
heartbeat[7100]: 2009/08/24_09:41:01 info: Status update for node apache2: status active
heartbeat[7100]: 2009/08/24_09:41:01 info: Link apache1:eth0 up.
heartbeat[7100]: 2009/08/24_09:41:01 info: Comm_now_up(): updating status to active
heartbeat[7100]: 2009/08/24_09:41:01 info: Local status now set to: 'active'
heartbeat[7100]: 2009/08/24_09:41:02 info: remote resource transition completed.
heartbeat[7100]: 2009/08/24_09:41:02 info: remote resource transition completed.
heartbeat[7100]: 2009/08/24_09:41:02 info: Local Resource acquisition completed. (none)
harc[7106]: 2009/08/24_09:41:02 info: Running /etc/ha.d/rc.d/status status
heartbeat[7100]: 2009/08/24_09:41:06 info: apache2 wants to go standby [foreign]
heartbeat[7100]: 2009/08/24_09:41:13 info: standby: acquire [foreign] resources from apache2
heartbeat[7125]: 2009/08/24_09:41:13 info: acquire local HA resources (standby).
ResourceManager[7138]: 2009/08/24_09:41:15 info: Acquiring resource group: apache1 192.168.200.103 apache2
IPaddr[7165]: 2009/08/24_09:41:17 INFO: Resource is stopped
ResourceManager[7138]: 2009/08/24_09:41:18 info: Running /etc/ha.d/resource.d/IPaddr 192.168.200.103 start
IPaddr[7241]: 2009/08/24_09:41:21 INFO: Using calculated nic for 192.168.200.103: eth0
IPaddr[7241]: 2009/08/24_09:41:21 INFO: Using calculated netmask for 192.168.200.103: 255.255.255.0
IPaddr[7241]: 2009/08/24_09:41:22 INFO: eval ifconfig eth0:0 192.168.200.103 netmask 255.255.255.0 broadcast 192.168.200.255
IPaddr[7224]: 2009/08/24_09:41:23 INFO: Success
ResourceManager[7138]: 2009/08/24_09:41:24 info: Running /etc/init.d/apache2 start
ResourceManager[7138]: 2009/08/24_09:41:26 ERROR: Return code 1 from /etc/init.d/apache2
ResourceManager[7138]: 2009/08/24_09:41:26 CRIT: Giving up resources due to failure of apache2
ResourceManager[7138]: 2009/08/24_09:41:26 info: Releasing resource group: apache1 192.168.200.103 apache2
ResourceManager[7138]: 2009/08/24_09:41:27 info: Running /etc/init.d/apache2 stop
ResourceManager[7138]: 2009/08/24_09:41:29 info: Running /etc/ha.d/resource.d/IPaddr 192.168.200.103 stop
IPaddr[7470]: 2009/08/24_09:41:31 INFO: ifconfig eth0:0 down
IPaddr[7453]: 2009/08/24_09:41:31 INFO: Success
heartbeat[7125]: 2009/08/24_09:41:32 info: local HA resource acquisition completed (standby).
heartbeat[7100]: 2009/08/24_09:41:32 info: Standby resource acquisition done [foreign].
heartbeat[7100]: 2009/08/24_09:41:32 info: Initial resource acquisition complete (auto_failback)
heartbeat[7100]: 2009/08/24_09:41:32 info: remote resource transition completed.
In this case, we can see that there was a network configuration of eth0:0 to 192.168.200.103 but then apache fails to start. Why is that? On apache's error log, there's absolutely nothing. how can I debug what's going on with apache?