I have already spend quite some time on investigations. It is hard to find details for most of these solutions, whether they meet (some of) my requirements or not. I have found endless feature lists but most of them (except DRBD and XtreemFS) have no use cases. They don't say what they are good for and what they cannot do. The term "distributed filesystem" has a lot of flavors and attributes: performance/load balancing, fault tolerance, replication, multi master/single master, low bandwith support, posix compliance, kernel integration/FUSE, installation of client software needed, ...(many more).
What I would like to have, if I can get it:
If I open a file on one of 2 or more subsidiaries, I want it to open immediately (local copy).
If I open it for writing, it should be locked on all other locations that are online (read access available).
If I close the file after editing, it should be replicated to the other servers in background (Maybe locked on all servers until replication is finished).
If one server goes offline and there is no replication, all files are still available for read/write access on any server.
If the server goes online again, an automatic sync of the changes files occurs. If changes on both sides are detected, either the last change wins or a copy of each changed is stored (similar to Dropbox).
Right now I am thinking about using Unison with scheduled replication or wait for XtreemFS to support write replication.
my investigations (unsorted):
Intermezzo
development stopped
PVFS
load balancing, parallel I/O, used for HPC
XtreemFS
distributed WAN filesystem, still under development, write replication not yet implemented
Coda
distributed filesystem for roadwarriors, under development, needs client software
Lustre
load balancing, parallel I/O, used for HPC
rsync
available on every unix system, asynchronous not realtime, not bidirectional
DRBD
works like a software mirror filesystem over LAN, only one active node, max. 2 nodes, supports asynchonous mode over low bandwith
OpenAFS
distributed filesystem with server (Linux, Unix) and clients (e.g. Windows, Linux), needs client software installed
Ceph
distributed filesystem for HPC, data distributed like a stripeset for load balancing
Hadoop
written in java, single master, multiple slaves, no kernel integration
GlusterFS
HPC, distributed data for data centers, supports replication
MooseFS
distributed file system for data centers, single master server
Unison
bidirectional synchronization tool, asynchronous, uses rsync