I have setup a small Linux KVM-based VM server farm. I have 4 machines that host the VMs. The VM hard drive image files are accessed by the VM host machines via NFS. The NFS server is a two-node high availability cluster based on heartbeat and DRBD, with each node having 8GB of memory. All network connectivity is gigabit ethernet. All machines are running Debian Linux 8. Also, all machines are plugged into UPS.
Everything seems to work well enough. Live migration of VMs among the 4 VM host machines works great. Testing fail-over of the NFS server also works with only a slight momentary pause being seen in the VMs while it is happening.
I am a little disappointed by the disk I/O performance I am seeing in the VMs, though. Since I don't have a basis for comparison, the performance I am seeing could very well be normal for my kind of setup. I am using qcow2 files for the hard drive image files. They are also configured for VirtIO. The cache mode I am using is "none".
One of the first things I tried playing around with are the NFS mount options. The NFS mount options I am using are: rsize=32768, wsize=32768, and timeo=60. I found that using "32768" for "rsize/wsize" actually gave me better throughput than using the default of "1048576", which I thought was odd. I used a simple dd read/write test to find this out though. Also, I am using a "timeo" value of "60" so that the NFS client will realize sooner that a fail-over has occurred at the NFS server.
The next thing I tried fiddling with was the cache mode. The reason why I used "none" in the first place was because KVM complained that it is not safe to do a live migration of the VM unless the cache mode was set to none. I tried using the KVM default of "writethrough" and saw that write performance was way worse than "none".
I then tried "writeback" and noticed that the performance was a lot better than both "writethrough" and "none". It seemed to me that "writeback" mode should be the way to go. However, after doing some Googling I read that "writeback" isn't safe and shouldn't be used for production servers.
If I'm understanding correctly, when a block is written to disk from the guest, the block is first sent to the host's disk cache. In "writeback" mode, the host will then "lie" to the guest and say that the block was successfully written to disk immediately even if the block is still sitting in the host's disk cache. I guess, this is why they say "writeback" mode is not safe because if the host crashes before the cache is flushed, then that data is lost.
I'm wondering just how dangerous "writeback" mode is for production servers with my particular setup. Everything is on UPS, so the most likely reason for the host to crash would be hardware failure which is probably the least likely thing to happen. Even if hardware failure is imminent, I would probably have ample opportunity to do a proper VM shutdown before it happens.
Also I am wondering if there is much difference between "none" and "writeback" with my setup since I am using NFS. I mean, obviously there is a difference since "writeback" yielded better throughput numbers, but I am wondering more with respect to safety. According to this diagram:
http://www.ilsistemista.net/index.ph...2.html?start=2
in "writeback" mode, blocks get sent to the host's cache before being sent to the physical disk cache while in "none" mode, blocks get sent directly to the physical disk cache. However, in my case there is no physical disk cache; only an NFS client cache. And since I am using the default NFS mount option of "async", the blocks will sit in the NFS client cache until it is filled or until it is told to flush the cache via an invocation of sync or something. Am I correct in concluding that there isn't much difference with respect to data safety between "none" and "writeback" with this setup?
Sorry for the long winded post. To summarize about what I am asking:
1. Is "writeback" cache mode really that bad with my setup?
2. Am I right in that there isn't a difference with respect to data safety between "writeback" and "none" cache mode with my setup?
Thanks!