Clustering with piranha

fliker · 08-28-2015, 02:50 PM

I've got a problem with piranha cluster.

I've got two Centos 6.5 servers with piranha, lvs, pulse and nanny. They have configured a virtual server, pulse is started on both servers. One of servers is lvs director and in piranha web panel output is similar to this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP VIP:PORT1 rr
-> IP2:PORT1 Route 1 0 0
-> IP1:PORT1 Local 1 0 0
TCP VIP:PORT2 rr
-> IP2:PORT2 Route 1 0 0
-> IP1:PORT2 Local 1 0 0

The used algorithm (round robin) should balance requests to both servers equally and it does, but only for a couple of minutes. In some moment on lvs director server something happens and from this moment requests are directed only to second server. After restarting pulse on first server requests are balanced again for almost same amount of time and after 5-15 minutes brokes again.

My config from /etc/sysconfig/ha/lvs.cf:

serial_no = 49
primary = IP1
primary_private = IP3
service = lvs
backup_active = 1
backup = IP2
backup_private = IP4
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = direct
debug_level = NONE
monitor_links = 1
syncdaemon = 1
syncd_iface = eth1
syncd_id = 0
tcp_timeout = 6
tcpfin_timeout = 15
virtual VSERVER1 {
active = 1
address = VIP eth1:1
vip_nmask = 255.255.255.224
port = PORT1
send = "GET /ping HTTP/1.0\r\n\r\n"
expect = "HTTP"
use_regex = 0
load_monitor = none
scheduler = rr
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 0
server RSERVER1 {
address = IP1
active = 1
port = PORT1
weight = 1
}
server RSERVER2 {
address = IP2
active = 1
port = PORT1
weight = 1
}
}
virtual VSERVER2 {
active = 1
address = VIP eth1:1
vip_nmask = 255.255.255.224
port = PORT2
send = "GET / HTTP/1.0\r\n\r\n"
expect = "HTTP"
use_regex = 0
load_monitor = none
scheduler = rr
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 0
server RSERVER1 {
address = IP1
active = 1
port = PORT2
weight = 1
}
server RSERVER2 {
address = IP2
active = 1
port = PORT2
weight = 1
}
}

On both servers there are eth1:0 and loopback lo:0 interfaces for vip address, net.ipv4.ip_forward is set for 1 net.ipv4.conf.lo.arp_ignore is 1 and net.ipv4.conf.lo.arp_announce is 2.

I think that it could be related to arp or something else which is checked from time to time and broke balancing.

I read that lvs has a function local node and piranha can use it, so this configuration should work, but restarting pulse every 15 minutes is not a solution for sure.