High availability Samba cluster DRBD + Heartbeat
Hello everyone,
This is my first experience with Linux and I am trying to setup a high availability samba cluster with DRBD and Heartbeat.
E N V I R O N M E N T _ D E T A I L S
Primary server
Server name: test02
IP address: 192.168.50.152
Subnet mask: 255.255.255.0
OS: CentOS 4.3 (Kernel version: 2.6.9-34.EL)
Applications installed: DRBD version 0.7.9, Heartbeat version 2.0.7, SAMBA version 3.0.10-1.4E.6
Secondary server
Server name: test01
IP address: 192.168.50.151
Subnet mask: 255.255.255.0
OS: CentOS 4.3 (Kernel version: 2.6.9-34.EL)
Applications installed: DRBD version 0.7.9, Heartbeat version 2.0.7, SAMBA version 3.0.10-1.4E.6
Client system
System name: test03
IP address: 192.168.50.153
OS: Windows XP Professional sp2
SAMBA is serviced on the IP address 192.168.50.195
Configuration files are as follows:
drbd.conf (test01/test02)
resource r0
{
protocol A;
incon-degr-cmd "halt -f";
startup
{
degr-wfc-timeout 120; # 2 minutes
}
disk
{
on-io-error detach;
}
net
{
}
syncer
{
rate 10M;
group 1;
al-extents 257;
}
on test01
{
device /dev/drbd0;
disk /dev/hda5;
address 192.168.50.151:7789;
meta-disk internal;
}
on test02
{
device /dev/drbd0;
disk /dev/hda5;
address 192.168.50.152:7789;
meta-disk internal;
}
}
ha.cf (test01/test02)
logfacility local0
logfile /var/log/ha-log
debug 1
bcast eth0
keepalive 2
deadtime 10
auto_failback off
node test01
node test02
ping test01
ping test02
#respawn hacluster /user/lib/heartbeat/ipfail
haresources (test01/test02)
test02 IPaddr::192.168.50.195
test02 drbddisk::r0 Filesystem::/dev/drbd0 smb
authkeys (test01/test02)
auth 3
3 md5 goose
smb.conf (test01)
[global]
workgroup = Workgroup
server string = SAMBA_TEST
admin users = root
share modes = yes
browseable = yes
username map = /etc/samba/smbusers
interfaces = 192.168.50.195
[goose01]
path = /mnt/goose01
writeable = yes
guest ok = yes
smb.conf (test02)
[global]
workgroup = Workgroup
server string = SAMBA_TEST
admin users = root
share modes = yes
browseable = yes
username map = /etc/samba/smbusers
interfaces = 192.168.50.195
[goose02]
path = /mnt/goose02
writeable = yes
guest ok = yes
smbusers (test01/test02)
# Unix_name = SMB_name1 SMB_name2 ...
# root = administrator admin
# nobody = guest pcguest smbguest
root = root
P R O B L E M
While client test03 attempts to access SAMBA services on 192.168.50.195, the primary server reboots.
T R O U B L E S H O O T I N G
The steps taken (to the point of failure) are as follows:
1. Started drbd on test02 (primary)
2. Started drbd on test01 (secondary)
3. Ran the command drbdadm primary all on test02
4. Ran the command mount /dev/drbd0 /mnt/goose02 on test02
5. Started samba on test02 (primary)
6. Created test files hello and world in the /mnt/goose02 share. (SAMBA was already configured with the /mnt/goose02 folder.)
7. I then try accessing it from the windows system using service IP address 192.168.50.195. If it does not crash, I can browse the files on 192.168.50.195 momentarily. Then the primary server reboots without warning.
8. After the primary server crashes, I ran the command drbdadm primary all on the secondary server, in order to mount the virtual block.
9. Then I ran the command mount dev/drbd0 /mnt/goose01 share on test01. (SAMBA was already configured with the /mnt/goose01 folder.)
10. Started the samba service on test01.
11. The files are accessible from the windows system on service IP 192.168.50.195
I tried to review the logs present in /var/log but I was not able to find any
conclusive evidence for the cause of the crash. High availability seems to be
working... but the tasks are manual as described in the above steps.
O B S E R V A T I O N
I suspect that heartbeat maybe the problem - specifically the virtual IP address. I have noticed that when I startup heartbeat, both the primary and secondary server have the virtual IP address of 192.168.50.195 for the initial period. After sometime, the virtual IP disappears from the secondary server (giving me the impression that it takes a while for heartbeat to get settled), but then the windows system is not able to ping the virtual IP address. Only after making manual entries for the IPaddress on both primary and secondary servers, its possible to ping the service address from the windows client. (Manual entry is made by typing the command /etc/ha.d/resource.d/IPaddr 192.168.50.195 start on primary server and /etc/ha.d/resource.d/IPaddr 192.168.50.195 stop on secondary server.
I need help with the following issues:
1. Feedback on the cause of the server crash and how to avoid it.
2. Suggestions to automate these manual tasks.
3. Feedback on the cluster configuration and scope for improvement.
Regards,
Alex
Last edited by djalex; 08-18-2006 at 10:57 AM.
|