LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Cluster implementation on Ubuntu environment (https://www.linuxquestions.org/questions/linux-newbie-8/cluster-implementation-on-ubuntu-environment-4175483767/)

anilCSE 11-07-2013 01:35 PM

Cluster implementation on Ubuntu environment
 
Hello!
I am having 70 laptops around me and I want to make the cluster from those machines. All the laptops are running on Ubuntu 10.04. The hardware configuration is: intel pentium core i3, 160GB HD, 2GB RAM. What is the best suitable package to implement the cluster with these systems. And where I can find the better instruction guides?

TenTenths 11-08-2013 07:25 AM

You are not clear on what you mean by "cluster"

High-Availability Cluster?
Database Cluster?
Compute Cluster?
Storage Cluster?

All of these have different meanings, different uses and different software requirements.

Let us know what you're trying to achieve and maybe we'll be able to point you in the right direction.

anilCSE 11-29-2013 11:58 AM

I want to implement the compute cluster.
Task is simple, just sharing the data to all the systems and process the data there and get the results back.
I want to know all the information about supporting softwares and implementation details.

Thank you!

btmiller 11-29-2013 01:05 PM

All the information? There's quite a lot of it out there, much more than can be discussed on a single forum threads. People have written entire PhD dissertations about this sort of thing. In fact, there's a free on-line eBook written by a physicist at Duke describing how to build a beowulf compute cluster. I'd suggest giving it a perusal.

A couple of questions you need to answer:

- Do all computers processing data work independently, or do they need to communicate frequently? If the former, do you have sfotware to parcel out chunks of work to each machine? If the latter, is your code written in parallel (e.g. using MPI) so it can run on a distributed memory cluster?
- Depending on the size of your data set and communications pattern, it's likely the networking between the machines will be the bottleneck. Have you taken this into account?
- Do you have the data analysis software readily at hand, or do you need to write it?

I've built several Beowulf clusters over the past ten-odd years, and here are another few things to keep in mind:

- Laptops (depending on the model) are not really designed and engineered to be continuously running flat-out 25/7/365. Your rate of hardware failure (especially fans and hard drives) might wind up being pretty high. Are you prepared for this?
- At the bare minimum, you'll need a master node that exports its hard drive via NFS for common data sharing amongst cluster nodes. It's also useful to have common user accounts; I suggest deploying LDAP (a pain to set up, but weay more secure than NIS if you do it right and use SSL). Kerberos can also integrate well with LDAP, but setting it up is quite complex. You'll definitely want to set up keys so users can ssh between machines passwordless.
- If you have high I/O requirements, you may even need to set up multiple NFS servers or deploy a parallel file system like Lustre, Gluster, or Ceph.
- You might need a batch scheduler. TORQUE, Open Grid Engine, Slurm, and LSF/OpenLava are popular choices. Depending on how complex scheduling will be, you may need an additional scheduler e.g. Maui.

Without more detail about how your software works and various other requirements, it's really tough to give good answers to your questions. However, there are some basic design considerations that you need to keep in mind.


All times are GMT -5. The time now is 06:56 PM.