LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-29-2020, 06:14 AM   #1
CPT-GrayWolf
LQ Newbie
 
Registered: Sep 2019
Location: Colorado USA
Distribution: Fedora
Posts: 14

Rep: Reputation: Disabled
Question Deciding on a Distributed Filesystem for HPC


I'm currently working on an HPC cluster for arbitrary reasons, but I'm having trouble deciding on what distributed filesystem to use.

There are so many options it's a bit overwhelming (ceph, glusterfs, lustre, beegfs, etc.), and it's hard to find conclusive comparisons between them, especially for HPC environments as most seem to focus on HA.

In my setup, I plan to use the filesystem in parallel with each node acting as both a host (contributing to the storage pool) and client (connecting to the pool). I need whatever I choose to run on Ubuntu's current LTS release, and preferably to support client side caching (RDMA support is also a welcome inclusion).

Thank you in advance for your input.
 
Old 03-06-2020, 08:24 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
Size of data? Performance requirements? Number of nodes? Reliability requirements?
 
Old 03-06-2020, 09:47 AM   #3
CPT-GrayWolf
LQ Newbie
 
Registered: Sep 2019
Location: Colorado USA
Distribution: Fedora
Posts: 14

Original Poster
Rep: Reputation: Disabled
The size and performance requirements are going to be somewhat arbitrary.
I don't have a specific use in mind, so the needs may vary. Currently I'm mostly thinking on AI, so likely it'll see a lot of use in storing large sets of various data for sorting, or for training neural networks, but that's just one use-case.
Obviously I plan to store as much of my datasets as possible locally on each machine, and I know that, practically speaking, a network filesystem, distributed or otherwise, is going to be slow (which is part of why having decent client-side caching would be nice).
Honestly, I'm not going to get to picky about performance in this case. If it can even partially scale up when I fully move over to InfiniBand, then I'll be satisfied for now. I can switch to dedicated I/O severs in a future revision. So, I guess, whatever gives the best general performance across a wide variety of scenarios.

As for the number of nodes, currently I have four in total including the head and workers (there was going to be five, but I had some bad hardware and had to make due). I do plan to expand in the future, perhaps to as many as a dozen or more workers, so being able to scale for that will likely be important (Though at that point, I'll likely already be using dedicated I/O servers).

And for reliability...
Well, that's not too big of a concern, especially with only four working nodes. Having the data still be accessible if one of them fails (or simply decides not to turn on when it should) would be nice though. I also imagine that, as I scale up, that will likely become more useful. So again, having the option would be good, but I'd likely give up some reliability and availability in exchange for a significant performance boost.
I have plenty of places to store important data. This is really just to act as a simple way to distribute and hold large data, anything critical can go, or be copied, elsewhere.
 
Old 05-05-2020, 05:03 AM   #4
BennyBurke
LQ Newbie
 
Registered: May 2020
Posts: 1

Rep: Reputation: Disabled
If it can even partially scale up when I fully move over to InfiniBand, then I'll be satisfied for now. I can switch to dedicated I/O severs in mcdvoice survey a future revision. So, I guess, whatever gives the best general performance across a wide variety of scenarios.

Last edited by BennyBurke; 05-06-2020 at 01:42 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Distributed filesystem source code for linux in c language Shreenivas Linux - Networking 1 02-07-2012 07:01 PM
LXer: The Lustre Distributed Filesystem LXer Syndicated Linux News 0 11-28-2011 09:40 PM
Best Distributed Filesystem...? overdie Linux - Server 4 08-10-2011 01:26 AM
Redhat HPC - Java extension to communicate with the HPC ktamilvanan Linux - Enterprise 0 03-14-2011 07:51 PM
Asynchronous Distributed Filesystem on RHEL mechcow Linux - Newbie 0 01-16-2008 04:49 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration