server side file system replication over WAN
I need to have synchronised filesystems on two servers connected via VPN tunnel over WAN. Both servers run Samba with replicated settings. I've been looking for a solution to replicate also the files served by the servers for quite a while. It should do replication in both directions close to realtime without the need of a high bandwidth connection.
I have found many solutions for clustered filesystems or replication: Coda, Lustre, Intermezzo, DRBD, OpenAFS, Ceph, Hadoop, GlusterFS Unison and others. None of them seems to meet my needs, some are only HPC clusteruing solutions, others are focused on client caching and offline editing. There is promising solution under development: XtreemFS. But its still under development, the latest version only supports read replication up to now. Has anyone a suggestion or solution? |
I've been looking for something similar for a while, although I don't wish to use a distributed filesystem.
I have two zfs servers between two sites and want data to be synchronised both ways (changes made on both sites). It's only a 1mb connection so for large files accessing over the link is not practical, there needs to be local copies asynchronously replicated. I tried an rsync set-up, but that wasn't suitable. I was going to try a Unison based system as this seemed the only vaguely suitable tool available (even though it's not under active development). Did you get anywhere with your system? |
Quote:
|
my needs and a list of possible solutions
I have already spend quite some time on investigations. It is hard to find details for most of these solutions, whether they meet (some of) my requirements or not. I have found endless feature lists but most of them (except DRBD and XtreemFS) have no use cases. They don't say what they are good for and what they cannot do. The term "distributed filesystem" has a lot of flavors and attributes: performance/load balancing, fault tolerance, replication, multi master/single master, low bandwith support, posix compliance, kernel integration/FUSE, installation of client software needed, ...(many more).
What I would like to have, if I can get it: If I open a file on one of 2 or more subsidiaries, I want it to open immediately (local copy). If I open it for writing, it should be locked on all other locations that are online (read access available). If I close the file after editing, it should be replicated to the other servers in background (Maybe locked on all servers until replication is finished). If one server goes offline and there is no replication, all files are still available for read/write access on any server. If the server goes online again, an automatic sync of the changes files occurs. If changes on both sides are detected, either the last change wins or a copy of each changed is stored (similar to Dropbox). Right now I am thinking about using Unison with scheduled replication or wait for XtreemFS to support write replication. my investigations (unsorted): Intermezzo development stopped PVFS load balancing, parallel I/O, used for HPC XtreemFS distributed WAN filesystem, still under development, write replication not yet implemented Coda distributed filesystem for roadwarriors, under development, needs client software Lustre load balancing, parallel I/O, used for HPC rsync available on every unix system, asynchronous not realtime, not bidirectional DRBD works like a software mirror filesystem over LAN, only one active node, max. 2 nodes, supports asynchonous mode over low bandwith OpenAFS distributed filesystem with server (Linux, Unix) and clients (e.g. Windows, Linux), needs client software installed Ceph distributed filesystem for HPC, data distributed like a stripeset for load balancing Hadoop written in java, single master, multiple slaves, no kernel integration GlusterFS HPC, distributed data for data centers, supports replication MooseFS distributed file system for data centers, single master server Unison bidirectional synchronization tool, asynchronous, uses rsync |
Thanks for that list, it's more than I hoped for.
Now let's hope it helps & encourages other LQ-ers to help you find your answers. |
Another way to get data replicated is using a service like Ubuntu One. This uses a userspace daemon to synchronize local changes to the Ubuntu One service which is an Amazon S3 storage. In the other direction changes are replicated from the service via events to the client. The whole replication is in near realtime, there is no delay or polling needed. The Ubuntu One service is only free up to a certain amount of data (2 GB), then it is 30$ per 20 GB per year.
The transmission is encrypted, the storage not. So you should consider encryption of the data. See https://wiki.ubuntu.com/UbuntuOne/Security |
All times are GMT -5. The time now is 06:23 PM. |