Is setting up a linux Cluster difficult?

Roy.Geer · 02-23-2015, 11:51 AM

hello

I want to setup two desktops each one having 4 cores and combine them to have 8 cores with linux clustering. Is this how clustering works and what level of difficulty is it?

Also, any good howto sites on the subject is appreciated. Thanks

suicidaleggroll · 02-23-2015, 12:00 PM

It depends on what you're trying to do. If you just want one heavy process to run in 8 threads on both machines, you should look into MPI. If you want to be able to run many simultaneous processes and have them automatically farmed out to the two machines based on load, memory usage, etc. then look into the TORQUE resource manager.

If you want your general purpose computer usage (opening spreadsheets, watching youtube, etc.) to use the processing power of both machines like it's one bigger machine, it's not going to happen, at least not in a way that would actually speed things up. Mostly because the CPU is rarely the bottleneck for those kinds of applications, and the latency when communicating between the two systems will slow things way down.

Roy.Geer · 02-23-2015, 12:10 PM

I would like to do video encoding, I guess this would fall to MPI clustering.

suicidaleggroll · 02-23-2015, 12:19 PM

Only if you build your own video encoder and can program in all of the MPI hooks, or you use one with distributed encoding already built in, eg: x264farm, RipBot264, MediaEncodingCluster, etc. Note that I have no experience with any of these, it's just what I found with a quick google search.

Roy.Geer · 02-23-2015, 12:31 PM

Quote:

Originally Posted by suicidaleggroll

Only if you build your own video encoder and can program in all of the MPI hooks, or you use one with distributed encoding already built in, eg: x264farm, RipBot264, MediaEncodingCluster, etc. Note that I have no experience with any of these, it's just what I found with a quick google search.

Same here. It's almost not worth doing a cluster then. I had the assumption that any application would utilize all resources from a cluster.

I thank you suicidaleggroll for the useful information about the types of clustering and other related info.

JeremyBoden · 02-23-2015, 12:37 PM

I read somewhere that video encoding isn't usually written to use no more than 4 cores.
From experience, it will definitely use at least 4 cores.

Roy.Geer · 02-23-2015, 12:46 PM

Quote:

Originally Posted by JeremyBoden

I read somewhere that video encoding isn't usually written to use no more than 4 cores.
From experience, it will definitely use at least 4 cores.

Perhaps, BUT...

Movies are rendered in clusters with thousands of cores, but as suicidaleggroll said earlier, they're probably programmed their own custom software to take advantage of the cores.

btmiller · 02-24-2015, 07:49 AM

Movie rendering is a type of problem that is called "embarassingly parallel" because each frame is independent of any other. Therefore, if you have 5,000 quad core computers, you can use them to render 5,000 frames simultaneouslky. Usually these work in a master-worker paradigm. One computer is the master, which distributes work to the other computers. When a computer finioshes its frame, it goes back and asks the master for more work. I think DrQueue is popular as software to run on the master for workload distribution.

Most large-scale high performance computing clusters, by contrast, are designed to run a tightly-coupled parallel application written using MPI or a similar paradigm like PGAS. In otherwords, the calculations being performed on one computer are tightly coupled to those performed on other computers, unlike the movie rendering case. At my work, we have a large-ish cluster that is used to run molecular simulations, for example. If you look at top500.org - the list of the top 500 reported most powerful computer systems - you'll see that they're all clusters. Usually they have a specialized parallel interconnect (e.g. InfiniBand) that allows high bandwidth, low latency message passage. For workload management, these tend to use TORQUE, SLURM, or one of the grid engine derivatives.