Hi James,
In short, no. I eventually dumped GlusterFS. Its a fantastic idea but for me the performance was shockingly bad and the client side caching was seriously inadequate.
In their defence however, GlusterFS can be mounted on multiple hosts and under such a scenario, extensive caching would be very bad. It just so happened that in my scenario I actually wanted to use extremely heavy caching and while gluster had the translators to do this sort of thing, they were seriously limited.
If I had the time and skills I would have written/upgraded the translators to accommodate what I needed since I think the concept of the product is awesome.
I did manage to get the level of cache that i needed by re-exporting GlusterFS with NFS and then setting actimeo=<high number> when the NFS share was mounted. This did what I needed but issues relating to 'Stale NFS Handle', which are apparently an issue relating to the use of the FUSE library, made it completely useless. (you would basically end up in a situation where random files are inaccessible for random periods)
I tried avoiding using the FUSE library by using the patched unfs+booster supplied by the glusterfs team. This overcame the 'Stale NFS Handle' issues but for some reason, all caching stopped working on the NFS client when mounting a share exported with unfs. After lots and lots of reading I found something that suggested there was an issue with inode numbers being passed from a userspace nfs daemon properly and unfs uses some form of compressed path string instead, which apparently breaks the client side cache.
At this point I gave up and went back to the drawing board.
I have implemented a solution that is performing pretty well. To some people (me at least) it would also be considered a surprisingly simple solution.
What my requirements were is this:
- Safe Storage (IE. 2 separate copies of data on 2 physically separate servers. mirroring across 2 disks in the same server was not good enough)
- One file system to span many disks.. with the ability to add disks as required to expand available space, and also retire them without losing data.
- Provide sufficient caching to accommodate my heavy file system scanning
My solution was (i think) incredibly simple. Here is what I did.
I started with two servers, each with a large matching sized hard disk partition to be used for storing data.
I used AoE (Atapi over Ethernet maybe? cant remember the exact term) to export the physical disk from each server.
I then enabled AoE on a third box which I call the master box. (basically, where the filesystem is to be mounted) By doing this, and allowing AoE to scan the network, the two exported hard disks from other servers appeared as local devices.
I then used the standard linux software raid tools (mdadm?) to make a raid 1 mirror from the two disks, which is then exported as something like /dev/md0
Then I setup LVM and used /dev/md0 as my first PV.
After getting my LV setup, I finally used ext3 filesystem and mounted the result under /data
Now, linux has standard cache functions built in for local disk access and uses up any spare ram to cache whatever it can. (think its like a first in first out job)... so, we just threw lots of RAM at the server and as a result got quite an extensive cache. (even with 1Gb of ram, it did what we wanted with the test data we had)
the rest is pretty simple.. for extending the storage space we just add two more servers with disks, raid-1 them with mdadm to make /dev/md1, get the disk marked as a LVM PV and use it to extend the LV. Then, resize2fs extends the ext3 filesystem to take up the extra space. all of this without even having to unmount the filesystem.
it is also possible using LVM to move the contents of one device to another empty device with an equal amount of space or more. this allows us to retire old servers as new ones are added where required.
I considered striping but considering the way the data we are storing is read/written and the way we are accessing it - the data gets distributed nicely across the disks anyway so we get a reasonable good striping effect.
There is no reason why you could not just export it with NFS and mount it elsewhere - on multiple systems if needed.
Obviously - one major drawback this method has compared to gluster is that this method has a single point of failure, and single point of congestion - the master box. This is exactly what I wanted in this case but might not be what you want.
Anyway, that was my epic journey. I hope it is of some use to you, or some other weary person who consumed far too much coffee trying to find a solution for something like this.
I would just like to re-iterate to anyone reading that GlusterFS is a fantastic product and while it did not quite suit me - is very much worth a good look.
They have a version of GlusterFS in beta with NFS server built in.. Might be worth a try out when its released. I tried the beta but it kept crashing in seconds for me.