LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   heartbeat cluster help (https://www.linuxquestions.org/questions/linux-server-73/heartbeat-cluster-help-644073/)

ufmale 05-22-2008 01:23 PM

heartbeat cluster help
 
I am experimenting with cluster of 2 linux machines (nlb0 & nlb2) using heartbeat for httpd service. Each machine has 2 eternet cards, et0 connects to public network, and et1 is connect to a local router (10.0.0.1 for nlb0, and 10.0.0.2 for nlb2)

I ran into a problem that i did not understand.
When i start a heartbeat service on nlb0,
the machine creates an alias ip, that is
et0:0 192.168.0.200 and start a httpd service.
I can access a web service from other computer with that ip (http://192.168.0.200)

On the nlb2, i start the heartbeat service, once it starts,
the nlb2 takes over the ip et0:0 192.168.0.200.
Now, I can still access the web service of nlb0 only with http://nlb0/index.html, but the http://192.168.0.200 is now on nlb2.

I am confused because the nlb0 has not been down. I though the nlb2 will take the ip only when nlb0 is down.

I do more experiment by restart the heartbeat service on nlb0.
Once the heartbeat service start, the nlb0 take back the ip 192.168.0.200
Of course, nlb2 has not been down.

Can anyone help me understand the concept? The setting is below.

haresource (same for both nlb0 and nlb2)
Code:

nlb0  192.168.0.200 httpd

ha.cf (nlb2 has the same file except, "ucast eth1 10.0.0.1")
Code:

debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 2
deadtime 20
warntime 10
initdead 80
udpport 694
ucast eth1 10.0.0.2
auto_failback on
node    nlb0
node    nlb2
#
ping 192.168.0.54
respawn hacluster /usr/lib/heartbeat/ipfail


p_s_shah 05-23-2008 10:45 AM

Upto my understanding, HA should work in following way:

I think you have configure nlb0 as a Primary node.

So, The time nlb0 is working it should be primary responsible for http requests.
Please note that nlb2 is also running Heartbeat at the same time and running it in passive mode.

As soon as, nlb0 goes down, nlb2 should take over 192.168.0.200 IP and satisfy http requests.

Now, as you have mentioned "auto_failback on", in your config file, whenever nlb0 is up, it will take over IP from nlb2 automatically.

I think the way you are testing the HA is wrong. You are stopping/starting Heartbeat service. In which case, HA will not be able to communicate with other node.

If you want to test the application, do it following way :
1. Make sure Heartbeat service is running on both the servers all the time.
2. Start to monitor logs on both the severs: tail -f /var/log/messages
Now, Heartbeat is checking ip in 10.0.0.? range for connectivity. So, just bring down the 10.0.0.1 (nlb0) and check logs on nlb2. It should show that IP take over is done.
3. Now bring the 10.0.0.1 (nlb0) up and check logs on both the servers again.

I hope this will help you out.
If any doubt, please reply back.

ufmale 05-23-2008 04:56 PM

I think taking down the 10.0.0.? is kind of unrealistic. What I did was taking down the network service of nlb0, i.e. service network restart.

Once I did the nlb2 pick up the httpd service. However, eventhough the nlb0 came back alive (after the network service restarted), nlb0 never pick up the httpd service back from the nlb2.

By the way, i changed the auto_failback to off on the nlb2's ha.cf

p_s_shah 05-23-2008 07:16 PM

First thing, upto my experience, files should be identical on both the servers.

Secondly, If you mentioned auto_failback off, then nlb0 won't pick up http requests automatically, you have to manually reassign IP addresses for the same.


All times are GMT -5. The time now is 03:08 PM.