LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-23-2010, 07:05 PM   #1
kuntergunt
LQ Newbie
 
Registered: Mar 2008
Location: Vienna
Distribution: Ubuntu, CentOS, Debian, Knoppix
Posts: 6

Rep: Reputation: 5
server side file system replication over WAN


I need to have synchronised filesystems on two servers connected via VPN tunnel over WAN. Both servers run Samba with replicated settings. I've been looking for a solution to replicate also the files served by the servers for quite a while. It should do replication in both directions close to realtime without the need of a high bandwidth connection.

I have found many solutions for clustered filesystems or replication: Coda, Lustre, Intermezzo, DRBD, OpenAFS, Ceph, Hadoop, GlusterFS Unison and others. None of them seems to meet my needs, some are only HPC clusteruing solutions, others are focused on client caching and offline editing.

There is promising solution under development: XtreemFS. But its still under development, the latest version only supports read replication up to now.

Has anyone a suggestion or solution?
 
Old 12-04-2010, 03:02 PM   #2
edkirk
LQ Newbie
 
Registered: Dec 2010
Posts: 1

Rep: Reputation: 0
I've been looking for something similar for a while, although I don't wish to use a distributed filesystem.

I have two zfs servers between two sites and want data to be synchronised both ways (changes made on both sites). It's only a 1mb connection so for large files accessing over the link is not practical, there needs to be local copies asynchronously replicated. I tried an rsync set-up, but that wasn't suitable. I was going to try a Unison based system as this seemed the only vaguely suitable tool available (even though it's not under active development).

Did you get anywhere with your system?
 
Old 12-06-2010, 05:59 AM   #3
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Quote:
Originally Posted by kuntergunt View Post
I have found many solutions for clustered filesystems or replication: Coda, Lustre, Intermezzo, DRBD, OpenAFS, Ceph, Hadoop, GlusterFS Unison and others. None of them seems to meet my needs, some are only HPC clusteruing solutions, others are focused on client caching and offline editing.
That's a lot of time w/ Google or Wikipedia for each of your readers to look up each of them, how about providing links so we can better understand what you want & why these are not appropriate.
 
Old 12-12-2010, 01:23 PM   #4
kuntergunt
LQ Newbie
 
Registered: Mar 2008
Location: Vienna
Distribution: Ubuntu, CentOS, Debian, Knoppix
Posts: 6

Original Poster
Rep: Reputation: 5
my needs and a list of possible solutions

I have already spend quite some time on investigations. It is hard to find details for most of these solutions, whether they meet (some of) my requirements or not. I have found endless feature lists but most of them (except DRBD and XtreemFS) have no use cases. They don't say what they are good for and what they cannot do. The term "distributed filesystem" has a lot of flavors and attributes: performance/load balancing, fault tolerance, replication, multi master/single master, low bandwith support, posix compliance, kernel integration/FUSE, installation of client software needed, ...(many more).

What I would like to have, if I can get it:
If I open a file on one of 2 or more subsidiaries, I want it to open immediately (local copy).
If I open it for writing, it should be locked on all other locations that are online (read access available).
If I close the file after editing, it should be replicated to the other servers in background (Maybe locked on all servers until replication is finished).
If one server goes offline and there is no replication, all files are still available for read/write access on any server.
If the server goes online again, an automatic sync of the changes files occurs. If changes on both sides are detected, either the last change wins or a copy of each changed is stored (similar to Dropbox).

Right now I am thinking about using Unison with scheduled replication or wait for XtreemFS to support write replication.

my investigations (unsorted):

Intermezzo
development stopped

PVFS
load balancing, parallel I/O, used for HPC

XtreemFS
distributed WAN filesystem, still under development, write replication not yet implemented

Coda
distributed filesystem for roadwarriors, under development, needs client software

Lustre
load balancing, parallel I/O, used for HPC

rsync
available on every unix system, asynchronous not realtime, not bidirectional

DRBD
works like a software mirror filesystem over LAN, only one active node, max. 2 nodes, supports asynchonous mode over low bandwith

OpenAFS
distributed filesystem with server (Linux, Unix) and clients (e.g. Windows, Linux), needs client software installed

Ceph
distributed filesystem for HPC, data distributed like a stripeset for load balancing

Hadoop
written in java, single master, multiple slaves, no kernel integration

GlusterFS
HPC, distributed data for data centers, supports replication

MooseFS
distributed file system for data centers, single master server

Unison
bidirectional synchronization tool, asynchronous, uses rsync
 
Old 12-12-2010, 10:37 PM   #5
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Thanks for that list, it's more than I hoped for.

Now let's hope it helps & encourages other LQ-ers to help you find your answers.
 
Old 05-09-2011, 03:04 PM   #6
kuntergunt
LQ Newbie
 
Registered: Mar 2008
Location: Vienna
Distribution: Ubuntu, CentOS, Debian, Knoppix
Posts: 6

Original Poster
Rep: Reputation: 5
Another way to get data replicated is using a service like Ubuntu One. This uses a userspace daemon to synchronize local changes to the Ubuntu One service which is an Amazon S3 storage. In the other direction changes are replicated from the service via events to the client. The whole replication is in near realtime, there is no delay or polling needed. The Ubuntu One service is only free up to a certain amount of data (2 GB), then it is 30$ per 20 GB per year.
The transmission is encrypted, the storage not. So you should consider encryption of the data. See https://wiki.ubuntu.com/UbuntuOne/Security
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Install Memcached With repcached "Built-In Server Side Replication" On Debian L LXer Syndicated Linux News 0 05-16-2010 11:30 AM
FUSE base file replication system rajendra_ait Linux - Newbie 1 10-12-2008 06:27 AM
File System Replication stickman Linux - Enterprise 2 08-23-2007 02:24 PM
Trying to run ftp server on Suse 10: Problem with WAN side access cornfusedlinuxuser Linux - Networking 1 03-07-2006 06:34 PM
Cant get WAN side ip...... hkl8324 Linux - Wireless Networking 1 06-08-2005 07:03 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 09:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration