LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Simple distributed file system that supports file striping and redundancy (https://www.linuxquestions.org/questions/linux-networking-3/simple-distributed-file-system-that-supports-file-striping-and-redundancy-4175529944/)

Ulysses_ 01-03-2015 02:59 PM

Simple distributed file system that supports file striping and redundancy
 
What is your recommended distributed file system with the following essential features?

1. Redundancy, so a file is duplicated across several servers across the internet.

2. File striping, so slices of a file are stored in different servers across the world, hopefully speeding up access somewhat.

3. Crucially, when you change a byte in a 1 GB file, it does NOT need to transfer the entire 1 GB to your pc and back to the servers.

4. It does NOT need to keep a duplicate of the file in your pc like dropbox or rsync (don't want that because I will be working on a diskless pc).

5. You do not need a PhD to configure it.

Global File System (GFS2) does all the above except 5. Seriously, some people find it too hard to get started with it, let alone configure it correctly.

The plan is to install something like that on several VPS's across the world, and store a big truecrypt container in it.

Any simpler recommendations?

smallpond 01-04-2015 08:51 AM

You could put a standard filesystem on top of lvm. DM can do redundancy and striping.

I would respectfully disagree with your claim in point 2 that striping a file across multiple distributed servers will speed up access. It is likely to slow it down to the speed of the slowest server.

Ulysses_ 01-04-2015 10:09 AM

Could you give a link for DM please. Also any link where it says lvm can work with remote drives to make them look as one local device.

Why would they stripe files if not for performance?

veerain 01-04-2015 11:15 AM

Dmsetup and LVM are not good, ZFS or btrfs are more easy and good. I think there is also CEPH Distributed Filesystem.

smallpond 01-05-2015 08:59 AM

Filesystems are striped on local disks for better performance because the disks are the performance bottleneck. For storage on remote servers, the network is the bottleneck, not the disks.

LVM can be created on any block device. Remote block devices may be served by iSCSI, for example. The reason this may meet your requirement better is that shared, distributed filesystems require complex locking setup, which violates 5. The block device approach is non-shared, so only needs normal filesystem setup.

The man page for lvcreate will describe the capabililites.

Ulysses_ 01-05-2015 10:49 AM

Can the virtual drives in a remote VPS be made to look like iSCSI drives so LVM can use them?

Ulysses_ 01-06-2015 07:34 AM

Quote:

Originally Posted by smallpond (Post 5295196)
striping a file across multiple distributed servers ... is likely to slow it down to the speed of the slowest server.

Why doesn't the same happen with software RAID-0 striping (slowing down to the speed of the slowest local drive)?

smallpond 01-07-2015 08:28 AM

Quote:

Originally Posted by Ulysses_ (Post 5296233)
Why doesn't the same happen with software RAID-0 striping (slowing down to the speed of the slowest local drive)?

A single disk can support about 100 MBPS sequential write rate, but a single SAS2 channel is 6 Gbps so I can run multiple disks in parallel and improve performance. What network bandwidth do you have?

Ulysses_ 01-07-2015 10:17 AM

Didn't RAID-0 exist in the age of 5 MB/s drives and 33 MB/s ide?

EDIT: Apologies, now I see what you mean, the network is slower than each server drive so you get the maximum bandwidth of the network no matter how many servers a file is spread over in file striping.

Except that if you try to download a large file like a linux distro from two different servers at the same time, the sum of the two rates (MB/s) exceeds the rate when you only download from one server. So there is something to be gained from striping across two servers.

smallpond 01-08-2015 08:22 AM

Yeah. I'm ignoring the effects of other systems accessing the same servers which can have a large effect on performance, plus or minus.


All times are GMT -5. The time now is 12:38 PM.