LinuxQuestions.org

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - Failover (https://www.linuxquestions.org/questions/linux-server-73/failover-830261/)

Hi all,

Thank every one that replied to my post especially those who helped me solve my problems.

The latest problem am having is, I am trying to setup a failover server for qmail server. One server is the promary and the second one is scondary. I want that, when the promary goes down, then the secondary should take over automatically with out the users knowing of failurs.

In this case, my choice was heartbeat. But I did every thing I could but the is not working.

Please any one have an idea what I can use us a failover? Or even how I can use heartbeat for the qmail server?

Also, if any one have a script that can do replication some directories to a secondary qmail server as well as mysql.

Thank you all for you assistant.

Regards
Emmanuel

Hello,

Can you post your ha.cf and IP configuration. Also are you using ldirectord or pacemaker in combination with HeartBeat to offer virtual service? Post config of what you're using too.

Have you checked your logs for errors regarding heartbeat (syslog)? If so what's that saying?

I use Unison File Synchronizer on my servers to synchronize any changes I make.

What version of MySQL are you using? It's most likely that you can set it up on both servers to replicate, either master-slave setup or master-master.

Kind regards,

Eric

ha.cf file:

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
ping 209.191.122.70
udpport 694
baud 38400
warntime 10
initdead 120
bcast eth0
node mail.domain.com
node mail2.domain.com
auto_failback on

haresources:
mail10.teledataict.com 0.0.0.0 mirror httpd qmail mysqld webmin

I am not using any other thing apart from heartbeat.

I will appreciate it if you can send me to why I can find sample of the unison ryncronizer configuration.

Hi,

If you state you're not using anything apart from HeartBeat, what do you mean by that? You'll need to have at least one virtual server configured in order to offer it from the cluster. HeartBeat only monitors the status of a server, service, IP and so on. Basically if you only have HeartBeat configured you have the primary layer to build upon. If one server goes down the other will take over. But since you're not offering any virtual services, there is nothing else to see in my opinion.

Did you install HeartBeat with Pacemaker, with ldirectord or with another resource manager? I imagine you're using ldirectord since you refer to haresources. Is that one line all you have in the haresources file? Doesn't seem correct to me since you're referring to 0.0.0.0 as IP while that should be the virtual IP you configure to offer the cluster service. If you're using ldirectord please post your ldirectord.cf file.

Here you can find a great manual for Unison File Synchronizer. And here's a simple example of a configuration file:

Code:

# Roots of the synchronization



root = /opt/lampp-data

root = ssh://srvtradws2//opt/lampp-data



# Some regexps specifying names and paths to ignore



ignore = Path mysql-data

owner = true

group = true



batch = true

log = true

logfile = /var/log/tradinet.log

This tells Unison to synchronize between the two root directories over SSH, to ignore directory that's named mysql-data, to synchronize owner and group, to ask no questions (batch mode), to write to a log and what log to use.

There are a lot more possibilities, have a look at the manual pointed to above.

Kind regards,

Eric

I only use yum to install the heartbeat and created those .cf files my self.

In oder to achieve my aim, setting up two mail server which one should serve as primary and the other a the secondary, what do I need?

Quote:

Originally Posted by wasamzy (Post 4091266)

Hello,

Heartbeat only takes care of the layer on which the other services are offered, so basically it just checks if the other node(s) are alive, maintains the virtual IP available, and so on.

In order to have a setup like you want you'll need also a cluster resource manager in which you define your virtual services (http, mail, and so on). The previous load balancer/resource manager was ldirectord, now it's Pacemaker which has lot's of features.

The two sites you need to visit to find all the necessary information are:
Linux HA
Pacemaker

Kind regards,

Eric

Thank you so much. I will check and give you feed back what happen next.

I am having trouble installing pacemaker. It is telling me, pacemaker conflicts with heartbeat. Well, right now, I want to use ldirectord for the clustering. But my question is, do I need a three machine to do the, where one will serve as a virtual server, one as master and slave?

Sorry for my ignorance.

Quote:

Originally Posted by wasamzy (Post 4099510)

Hello,

Don't apoligize ever for trying and willing to learn! There's no need for it, we all have to start somewhere.

I imagine you got the HeartBeat package that comes with ldirectord included and that might be the reason why Pacemaker is complaining. Not sure about that though since I have yet to start with Pacemaker myself (migrate from ldirectord).

You can set up a perfect testing environment with two machines, it all depends on your choice and the configuration you set up. You can have for example two identical nodes with the same config and OS. Both of them will be configured with HeartBeat and ldirectord and will be listening on the virtual IP. However the task of HeartBeat is to monitor that 'virtual layer' so that if the first server goes down, the second one automatically gets the virtual layer activated. Only one of the two machines can have the virtual IP assigned at the time, and ldirectord will run on the same server.

Hope that makes it a bit clearer. If you need more info just ask and I'll try to explain more in detail (or with the configuration I have setup).

Kind regards,

Eric

Thank you very much for your reply.
Actually, I removed the heartbeat and ldirectord from the system before installing the pacemaker. But what am thinking is that, may be the versions of heartbeat on the epel repo is different from the other repo. And both pacemaker and heartbeat is suppose to come from the epel repo together. This is getting interesting though.

I am doing this project on two CentOS running qmail email server. I hope my earlier post on this explain all that I wanted to do already.

I will be happy if you can give me sample of your setup from A to Z and some detail explanation, so that I can follow that achieve what am after.

Happy SFD in advance!!!!

I have configured the hearbeat and ldirectord. The web seem to be working that is failover, but in the case of the mail server, when I send mail from yahoo for example, the slave is suppose to recive the mail on behalf of the master when it is down, but that is not happening. Rather, the mails keep hanging till the master is up before it gets delivered. I can open the webmail alright and login, but just that the mails are not troping.

My ldirectord.cf on both machine:

checktimeout=30
checkinterval=2
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=41.211.31.60:80
fallback=127.0.0.1:80
real=192.168.2.3:80 gate
real=192.168.2.2:80 gate
service=http
httpmethod=GET
receive="webserverisworking"
persistent=100
scheduler=lblc
protocol=tcp
checktype=negotiate
virtual=41.211.31.60:25
real=192.168.2.3:25 gate
real=192.168.2.2:25 gate
service=smtp
httpmethod=GET
receive="mailserverisworking"
persistent=100
scheduler=lblc
protocol=tcp
checktype=negotiate
virtual=41.211.31.60:110
real=192.168.2.3:110 gate
real=192.168.2.2:110 gate
service=pop
httpmethod=GET
receive="mailserverisworking"
persistent=100
scheduler=lblc
protocol=tcp
checktype=negotiate
virtual=41.211.31.60:143
real=192.168.2.3:143 gate
real=192.168.2.2:143 gate
service=imap
httpmethod=GET
receive="mailserverisworking"
persistent=100
scheduler=lblc
protocol=tcp
checktype=negotiate

haresources:
mail10.teledataict.com ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::41.211.31.60/25/eth1/41.211.31.127 qmailctl

ha.cf:

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
ping 209.191.122.70
udpport 694
baud 38400
warntime 10
initdead 120
bcast eth0
node mail10.teledataict.com
node mail2.dot.com.gh
auto_failback on
respawn hacluster /usr/lib/heartbeat/ipfail

ifconfig (Master):

eth0 Link encap:Ethernet HWaddr 00:1D:09:10:C0:5F
inet addr:192.168.2.3 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::21d:9ff:fe10:c05f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:68 errors:0 dropped:0 overruns:0 frame:0
TX packets:84 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10549 (10.3 KiB) TX bytes:20849 (20.3 KiB)
Interrupt:16 Memory:dfbf0000-dfc00000

eth1 Link encap:Ethernet HWaddr 00:15:E9:43:C3:C1
inet addr:41.211.31.60 Bcast:41.211.31.127 Mask:255.255.255.128
inet6 addr: fe80::215:e9ff:fe43:c3c1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:195421996 errors:0 dropped:0 overruns:0 frame:0
TX packets:21251583 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3866922044 (3.6 GiB) TX bytes:3410977679 (3.1 GiB)
Interrupt:22 Base address:0xcf00

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:212613 errors:0 dropped:0 overruns:0 frame:0
TX packets:212613 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:34993565 (33.3 MiB) TX bytes:34993565 (33.3 MiB)

ifconfig (Slave):

eth0 Link encap:Ethernet HWaddr 00:0D:88:F4:87:DD
inet6 addr: fe80::20d:88ff:fef4:87dd/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:2929547 errors:0 dropped:0 overruns:0 frame:0
TX packets:2931094 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:225039794 (214.6 MiB) TX bytes:227110561 (216.5 MiB)
Interrupt:22 Base address:0x8f00

eth1 Link encap:Ethernet HWaddr 00:1D:09:31:3F:76
inet addr:41.211.4.108 Bcast:41.211.4.111 Mask:255.255.255.248
inet6 addr: fe80::21d:9ff:fe31:3f76/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1972567 errors:0 dropped:0 overruns:0 frame:0
TX packets:98015 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:193390299 (184.4 MiB) TX bytes:16102523 (15.3 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:58148 errors:0 dropped:0 overruns:0 frame:0
TX packets:58148 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:16702337 (15.9 MiB) TX bytes:16702337 (15.9 MiB)

peth1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:2605794 errors:15929 dropped:0 overruns:0 frame:674
TX packets:170634 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:294586487 (280.9 MiB) TX bytes:29368602 (28.0 MiB)
Interrupt:16 Memory:dfbf0000-dfc00000

vif0.1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:170442 errors:0 dropped:0 overruns:0 frame:0
TX packets:2593744 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:28476730 (27.1 MiB) TX bytes:282699585 (269.6 MiB)

virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:139 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:29821 (29.1 KiB)

xenbr1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
eth0 Link encap:Ethernet HWaddr 00:0D:88:F4:87:DD
inet6 addr: fe80::20d:88ff:fef4:87dd/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:2929547 errors:0 dropped:0 overruns:0 frame:0
TX packets:2931094 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:225039794 (214.6 MiB) TX bytes:227110561 (216.5 MiB)
Interrupt:22 Base address:0x8f00

eth1 Link encap:Ethernet HWaddr 00:1D:09:31:3F:76
inet addr:41.211.4.108 Bcast:41.211.4.111 Mask:255.255.255.248
inet6 addr: fe80::21d:9ff:fe31:3f76/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1972567 errors:0 dropped:0 overruns:0 frame:0
TX packets:98015 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:193390299 (184.4 MiB) TX bytes:16102523 (15.3 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:58148 errors:0 dropped:0 overruns:0 frame:0
TX packets:58148 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:16702337 (15.9 MiB) TX bytes:16702337 (15.9 MiB)

peth1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:2605794 errors:15929 dropped:0 overruns:0 frame:674
TX packets:170634 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:294586487 (280.9 MiB) TX bytes:29368602 (28.0 MiB)
Interrupt:16 Memory:dfbf0000-dfc00000

vif0.1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:170442 errors:0 dropped:0 overruns:0 frame:0
TX packets:2593744 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:28476730 (27.1 MiB) TX bytes:282699585 (269.6 MiB)

virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:139 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:29821 (29.1 KiB)

xenbr1 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:2252932 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:159853838 (152.4 MiB) TX bytes:0 (0.0 b) UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:2252932 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:159853838 (152.4 MiB) TX bytes:0 (0.0 b)

Please any help?

Hi,

I decided not to use heartbeat and ldirectord for the failover any more, due to the difficulties am facing. Right now, I used keepalived and it is fantastic. The only problem I have with it right now and will need help is that, when I take off the network cable from the master, the slave is able to detect that, the interface is down, takeover is not taking place.

But when I stop the keepadlived service on the master or shatdown, the slave is able to takeover and act as the master until I start the keepalived or the machine.

my configurations is as follows:

Master:/etc/keepalived/keepalived.conf

Configuration File for Keepalived

# Global Configuration
global_defs {
notification_email {
emmanuel.buamah@teledataict.com
}
notification_email_from keepalived@mail10.teledataict.com
smtp_server localhost
smtp_connect_timeout 30
router_id LVS_MASTER # string identifying the machine
}

# describe virtual service ip
vrrp_instance VI_1 {
# initial state
state MASTER
interface eth0
# arbitary unique number 0..255
# used to differentiate multiple instances of vrrpd
virtual_router_id 1
# for electing MASTER, highest priority wins.
# to be MASTER, make 50 more than other machines.
priority 100
authentication {
auth_type PASS
auth_pass 1q2w3e
}
virtual_ipaddress {
41.211.31.60/25 dev eth1
}
# Invoked to master transition
notify_master "/etc/keepalived/bypass_ipvs.sh del 41.211.31.60"
# Invoked to slave transition
notify_backup "/etc/keepalived/bypass_ipvs.sh add 41.211.31.60"
# Invoked to fault transition
notify_fault "/etc/keepalived/bypass_ipvs.sh add 41.211.31.60"
}

# describe virtual mail server
virtual_server 41.211.31.60 25 {
delay_loop 15
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP

real_server 192.168.2.3 25 {
TCP_CHECK {
connect_timeout 3
}
}
}

# describe virtual web server
virtual_server 41.211.31.60 80 {
delay_loop 30
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP

real_server 192.168.2.3 80 {
HTTP_GET {
url {
path /var/www/html/index.html
digest 9c740d4a0e350371229915359a614d63
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
real_server 192.168.2.2 80 {
HTTP_GET {
url {
path /var/www/html/index.html
digest 98049d2520fe2ae1cfefc647b9e7e95d
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
}

And on the slave:

# Configuration File for Keepalived

# Global Configuration
global_defs {
notification_email {
emmanuel.buamah@teledataict.com
}
notification_email_from keepalived@mail2.dot.com.gh
smtp_server localhost
smtp_connect_timeout 30
router_id LVS_MASTER # string identifying the machine
}

# describe virtual service ip
vrrp_instance VI_1 {
# initial state
state BACKUP
interface eth0
# arbitary unique number 0..255
# used to differentiate multiple instances of vrrpd
virtual_router_id 1
# for electing MASTER, highest priority wins.
# to be MASTER, make 50 more than other machines.
priority 50
authentication {
auth_type PASS
auth_pass 1q2w3e
}
virtual_ipaddress {
41.211.31.60/25 dev eth1
}
# Invoked to master transition
notify_master "/etc/keepalived/bypass_ipvs.sh del 41.211.31.60"
# Invoked to slave transition
notify_backup "/etc/keepalived/bypass_ipvs.sh add 41.211.31.60"
# Invoked to fault transition
notify_fault "/etc/keepalived/bypass_ipvs.sh add 41.211.31.60"
}

# describe virtual mail server
virtual_server 41.211.31.60 25 {
delay_loop 15
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP

real_server 192.168.2.3 25 {
TCP_CHECK {
connect_timeout 3
}
}
real_server 192.168.2.2 25 {
TCP_CHECK {
connect_timeout 3
}
}
}

# describe virtual web server
virtual_server 41.211.31.60 80 {
delay_loop 30
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP

real_server 192.168.2.3 80 {
HTTP_GET {
url {
path /var/www/html/index.html
digest 9c740d4a0e350371229915359a614d63
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
real_server 192.168.2.2 80 {
HTTP_GET {
url {
path /var/www/html/index.html
digest 98049d2520fe2ae1cfefc647b9e7e95d
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
}

Any one to help?