Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm currently working on an HPC cluster for arbitrary reasons, but I'm having trouble deciding on what distributed filesystem to use.
There are so many options it's a bit overwhelming (ceph, glusterfs, lustre, beegfs, etc.), and it's hard to find conclusive comparisons between them, especially for HPC environments as most seem to focus on HA.
In my setup, I plan to use the filesystem in parallel with each node acting as both a host (contributing to the storage pool) and client (connecting to the pool). I need whatever I choose to run on Ubuntu's current LTS release, and preferably to support client side caching (RDMA support is also a welcome inclusion).
The size and performance requirements are going to be somewhat arbitrary.
I don't have a specific use in mind, so the needs may vary. Currently I'm mostly thinking on AI, so likely it'll see a lot of use in storing large sets of various data for sorting, or for training neural networks, but that's just one use-case.
Obviously I plan to store as much of my datasets as possible locally on each machine, and I know that, practically speaking, a network filesystem, distributed or otherwise, is going to be slow (which is part of why having decent client-side caching would be nice).
Honestly, I'm not going to get to picky about performance in this case. If it can even partially scale up when I fully move over to InfiniBand, then I'll be satisfied for now. I can switch to dedicated I/O severs in a future revision. So, I guess, whatever gives the best general performance across a wide variety of scenarios.
As for the number of nodes, currently I have four in total including the head and workers (there was going to be five, but I had some bad hardware and had to make due). I do plan to expand in the future, perhaps to as many as a dozen or more workers, so being able to scale for that will likely be important (Though at that point, I'll likely already be using dedicated I/O servers).
And for reliability...
Well, that's not too big of a concern, especially with only four working nodes. Having the data still be accessible if one of them fails (or simply decides not to turn on when it should) would be nice though. I also imagine that, as I scale up, that will likely become more useful. So again, having the option would be good, but I'd likely give up some reliability and availability in exchange for a significant performance boost.
I have plenty of places to store important data. This is really just to act as a simple way to distribute and hold large data, anything critical can go, or be copied, elsewhere.
If it can even partially scale up when I fully move over to InfiniBand, then I'll be satisfied for now. I can switch to dedicated I/O severs in mcdvoice survey a future revision. So, I guess, whatever gives the best general performance across a wide variety of scenarios.
Last edited by BennyBurke; 05-06-2020 at 01:42 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.