LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-25-2015, 10:35 AM   #1
usao
Member
 
Registered: Dec 2011
Location: Chandler, AZ
Posts: 286

Rep: Reputation: Disabled
Distributed filesystem


What is the preferred distributed filesystem for CentOS?
Im looking for the ability to have an NFS-like filesystem with storage distributed accrosss the nodes which mount the filesystem, and to have redundancy so that the FS doesn't go offline if one of the nodes is MIA...
 
Old 03-25-2015, 05:54 PM   #2
ttk
Senior Member
 
Registered: May 2012
Location: Sebastopol, CA
Distribution: Slackware64
Posts: 1,038
Blog Entries: 27

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Quote:
Originally Posted by usao View Post
What is the preferred distributed filesystem for CentOS?
Im looking for the ability to have an NFS-like filesystem with storage distributed accrosss the nodes which mount the filesystem, and to have redundancy so that the FS doesn't go offline if one of the nodes is MIA...
GlusterFS is the preferred solution for RHEL and its derivatives, like CentOS. It is the basis of Redhat's "Red Hat Storage Server" turnkey storage solution. Rolling your own GlusterFS volume is pretty easy.

http://www.gluster.org/
 
1 members found this post helpful.
Old 03-25-2015, 05:59 PM   #3
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
Hi
CentOS comes from RedHat, so the preferred one is probably GlusterFS (was bought by RedHat).
 
Old 03-26-2015, 10:03 AM   #4
usao
Member
 
Registered: Dec 2011
Location: Chandler, AZ
Posts: 286

Original Poster
Rep: Reputation: Disabled
Just took a quick look at it on their wiki page, and have a couple of questions.
1) If I use 2 nodes, will it mirror the contents of the FS on one node to the other node to provide fault tolerance?
2) Is there any lag-time that I need to be concerned about in the replication?
Im planning on using a distributed database engine, which runs on multiple nodes, and needs to see the data content in the files in real-time. If there is a lag, then the engines running on different physical machines will be slightly out of sync with each other.
 
Old 03-26-2015, 02:49 PM   #5
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
Mmmhh, maybe you're mixing up a little bit 2 things (but I admit that I might be wrong myself)?

On one side you would use a distributed storage engine to get more storage reliability (and perhaps in some cases better read-speed, but the opposite might as well happen depending on what you're using and how you configure things).

On the other side the distributed storage wouldn't be accessed directly by an application but most probably through some client (whatever it is) that communicates to the server to write/read whatever the application wants.

Therefore the latency of the replication between nodes shouldn't be relevant for you because it's the storage server that is supposed to coordinate things and deliver to your programs always the correct/latest state of the data.
That's at least what I think, but it might be wrong especially depending on the storage engine that is being used.

Maybe somebody else in the forum can add/correct?
 
Old 03-26-2015, 02:59 PM   #6
usao
Member
 
Registered: Dec 2011
Location: Chandler, AZ
Posts: 286

Original Poster
Rep: Reputation: Disabled
Perhaps I should clarify what im after.
We have a distributed database engine, which runs on multiple hosts and (historically) has accessed it's data through shared SAN storage where a set of SAN luns is presented to multiple hosts and all the individual database engines work together to keep the content of the SAN luns consistent.
Im looking for a way to transition this to a filesystem (NFS or other) in which I can put the database files on a filesystem rather than raw SAN luns.
The database vendor has indicated they support NFS, but the issue I have is that NFS is a single point of failure. The SAN system has redundant fiber paths, redundant luns, RAID and redundant controllers, so I was looking for a way to obtain a redundant/fault-tolerant NFS configuration, such that any of the database nodes going offline wouldn't affect the ability of the other database nodes to continue working with the available storage.
Does this description help in any way?
 
Old 03-26-2015, 03:13 PM   #7
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
Thx
GlusterFS doesn't sound bad for that, but now it's my turn to ask something (I know nothing about SAN):
assuming that you have a SAN storage mounted on a PC on "/mnt/mystorage" and that that mountpoint is connecting to the SAN (server? something?) 10.0.0.1, what happens in the case that 10.0.0.1 goes down? will the mountpoint (whichever drivers are handling on the client PC "/mnt/mystorage") try to connect automatically to some other storage server?
 
Old 03-26-2015, 03:17 PM   #8
usao
Member
 
Registered: Dec 2011
Location: Chandler, AZ
Posts: 286

Original Poster
Rep: Reputation: Disabled
All our SAN storage is mapped to Linux hosts, so I can't speak to PC.
However, each LUN on the SAN is using multipathing, meaning that it shows-up on the host side under multiple ID's or Names. The multipathing software looks as the device signatures on all the devices and recognizes that a set of LUN's is really one device, so they create a super-device which allows I/O to go through any path which is active. So, if we loose a HBA port, or a fiber cable, or a fiber switch or a SAN target host, there are other paths which can (and will) be used to ensure the I/O is completed.
Multipathing is something im fairly familiar with on the SAN fabric side, I don't know if there is a comparable feature for NAS or IP storage.
 
Old 03-26-2015, 03:36 PM   #9
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
Thx
Ok, I am not good enough to give you reliable information.

Maybe these posts help?
http://www.adamdrew.net/?p=34
http://srkykzm.com/2014/12/iscsi-tar...ith-glusterfs/

They both talk about iSCSI, so that may be the main "multipath"-SW-only-solution for Linux.

On my side all I can say is:
 
Old 03-26-2015, 04:11 PM   #10
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
(...sorry - wrong button - or maybe just a bit of suspence )
1)
I use GlusterFS only as a pure replacement for NFS.
Reason: NFS3/4 was too slow (50 to 80MB/s) but with GlusterFS I can read/write with 100MB/s through my Gbit-switch.
I think that I tried out all alternatives, but in my case they were all slow - iSCSI way very slow (~20 to 40MB/s) so it can be that I did not configure correctly my network stack.

2)
My setup is:
- A SW-raid5
- Normal Linux partition.
- Partition formatted as ext4 (GlusterFS recommends xfs, but in my opinion ext4 is much faster when dealing with small files, so that's what I ended up using).
- Glusterfs server SW that serves all the files stored in ext4 exactly as they are - I don't let GlusterFS do anything else because when I set things up GlusterFS was still quite young (v3.0) and didn't know if it was going to screw up things in which case I still wanted to be able to access those files directly through ext4 without going through GlusterFS.
- 3 linux client PCs mounting the server's shared GlusterFS.
- The 3 Linux clients handle the mounted storage as it were local.

3)
I chose GlusterFS because:
- all other SW I tested was too slow or had other limitations (e.g. max file size) or were just too complicated to set up or manage (I didn't want to set up a metadata server, then a locking server, etc...).
- GlusterFS was the only alternative that gave me the option to set on the client what to do when the server storage went offline. I never looked at multipath because I don't have a distributed storage, but GlusterFS was the only one that doesn't care if the storage server was started before or after the client - if the client comes online before the server then it keeps pinging until it's able to mount or if it loses the connection to the server it keeps pinging until it is able to establish again a connection or if a timeout (which is set on the client) is reached than just shows an empty mountpoint.

4)
- GlusterFS is in my opinion the best "free" distributed storage solution for Linux and especially as it's now owned by RedHat, it's especially good for your CentOS, or Fedora/RedHat distributions => it's for sure worth a try.
- The documentation isn't bad but I always had the feeling that it was messy => hope it's now better.
- My setup is very simple, but in any case I never ever had any kind of problems with GlusterFS itself.
- Performance and multipath are in your case probably the most important requirements. I did probably towards end of 2013 some tests using GlusterFS's own organization of the data (in this case instead of having the files stored 1:1 on ext4 GlusterFS created on ext4 many many medium-sized files into which it writes the data) and I remember that I wasn't excited about its performance, but I didn't play a lot with it so there might be room for improvement.

What do you think? Will you give it a try?
What kind of HW are you planning to use?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Distributed filesystem source code for linux in c language Shreenivas Linux - Networking 1 02-07-2012 07:01 PM
LXer: The Lustre Distributed Filesystem LXer Syndicated Linux News 0 11-28-2011 09:40 PM
Best Distributed Filesystem...? overdie Linux - Server 4 08-10-2011 01:26 AM
Asynchronous Distributed Filesystem on RHEL mechcow Linux - Newbie 0 01-16-2008 04:49 PM
DISCUSSION: Virtual Filesystem: Building a Linux Filesystem from an Ordinary File mchirico LinuxAnswers Discussion 0 10-28-2004 10:35 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration