I installed UCARP on four nodes running Gluster, which is then exported over NFS. The idea is to assign a VIP to one of the nodes so that if that node fails, the VIP will be assigned to one of the others. Since it's an active/passive system, /var/lib/nfs is shared between the nodes so NFS can be started on the new node without any issues. However, when I simulate a failure by unplugging the network cable, more than one node becomes the new master. Sometimes it's all three, sometimes just two. This will cause data corruption. Here is the network configuration for the first node:
Code:
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet static
address 10.80.80.1
netmask 255.255.0.0
network 10.80.0.0
broadcast 10.80.255.255
gateway 10.80.10.1
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers 10.80.90.250 4.2.2.2 8.8.8.8
dns-search lax.xen.com
ucarp-vid 10
ucarp-vip 10.80.80.100
ucarp-password secret
ucarp-advskew 0
ucarp-advbase 1
ucarp-facility local1
ucarp-master yes
ucarp-upscript /usr/share/ucarp/vip-up
ucarp-downscript /usr/share/ucarp/vip-down
ucarp-nomcast yes
iface eth0:ucarp inet static
address 10.80.80.100
netmask 255.255.0.0
The only difference between the configuration files are the advskew values. I've tried all sorts of combinations but the failover never results in a single master taking over. I did see some errors such as this one
Code:
Apr 11 18:46:44 gluster1 ucarp[13488]: [ERROR] exiting: pfds[0].revents = 8
but I don't know what they mean. Did I miss something in the installation?