Hello everyone,
We have just built a KVM cluster on some ProLiant DL560's on top ov CentOS 7.
We have created GFS2 cluster using pacemaker/corosync and installed libvirt (4.5.0) on them.
Beside GFS2 cluster, servers are connected to NetApp storage and we map NetApp LUN's to all hypervisors. Those LUN's are to be used on Virtual Machines created in libvirt/qemu.
To clusterize everything and minimize all potential downtime, we wanted pacemaker/corosync to manage virtual machines.
Now here comes the fun (annoying) part.
- I create a VPS Server and when defining disk, i will point to mapped LUN (/dev/mapper/proba). Disk is define without cache (cache='none')
- I shut it down, edit it's xml file in /etc/libvirt/qemu so that i remove "CPU" directive.
- I Copy that xml to shared GFS2 directory /shared/xml
- I undefine VPS (let's say proba) using "virsh undefine proba"
- Now i create a pcs resource using following command
Code:
resource create proba_vps VirtualDomain hypervisor="qemu:///system" config="/shared/xml/proba.xml" migration_transport=ssh op start timeout="300s" op stop timeout="300s" op monitor timeout="30" interval="10" meta allow-migrate="true" priority="100" target-role="Started" is-managed="true" op migrate_from interval="0" timeout="120s" op migrate_to interval="0" timeout="120" --group AQUILA
It will successfully define it and automatically start it. But the problem lies when i try to move that to another hypervisor using the command
Code:
pcs resource move proba kvm_aquila-02
It will move it, but it will shut it down first and then move it, but instead id should perform LIVE seamless migration.
If i create the same VPS but one using QCOW image instead of multipath LUN, it will perform live migration.
I don't know if the problem lies in pacemaker/corosync or libvirt itself, so i would like to hear some thoughts to get this resolved.
If i define VPS in plain libvirt, without pacemaker/corosync and try to migrate it using
Code:
virsh migrate proba qemu+ssh://kvm_aquila-02/system --live --persistent --undefinesource
It will not allow me to do so, but will error out with
Code:
error: Unsafe migration: Migration without shared storage is unsafe
If i add "--unsafe" flag it will pass fine, without errors, issues etc...
Maybe pacemaker/corosync is doing the same, trying live migration without --unsafe. Maybe problem lies somewhere else. Don't know.
We have plenty of hypervisors on top of CentOS 6 and libvirt 0.10.2 and migration with virsh migrate with same parameters posted above works without issues and errors.
What to do?
Thanks in advance,
Marko Todoric