questions about cluster

TheStupid · 09-03-2007, 12:59 AM

I am really new, so my questions must be really stupid.

1. a cluster is a group of computers running as if they were one, right? so if I cluster together 3 computers, my Linux would run almost 3 times faster on the master note, right?

2. can each user use the cluster with just a keyboard/mouse/monitor? If not, how about a diskless computer?

3. can i use the cluster as a something like a Windows terminal server?

4. i understand if all notes have no hard drives, only storage space and memory limit would be on the master note, right?

brianmcgee · 09-03-2007, 03:24 AM

First of all there are different types of clusters. HA-cluster or HPC-cluster.

HPC-cluster usually try to distribute threads amongst all available CPUs in the cluster and then combine the results. The kernel or the clustered application is able to share information between the other nodes for example on thread basis.

HA-cluster try to balance the load of connected clients amongst the cluster nodes. But not every application runs active/active on the cluster. If you have an active/passive configuration you only failover an application when a cluster node fails.

Not all HA-clusters behave truly as one unit. Some appear only to the clients as one machine, others appear even for the administration as one machine. The latter is the case with single system image clusters like the TruCluster or Open-sharedroot cluster.

It is possible to distribute users amongs a cluster and since the cluster is mostly located in a data center you will provide ssh access, virtual kvm or a graphical login shell.

Diskless clusters scale very well. If you use a SAN you can even add multiple I/O paths to increase the overall performance.

TheStupid · 09-03-2007, 10:59 AM

In a HPC cluster, would all my programs, such as vmware, run faster or just the ones written specifically for clustering?

JonathanWilson · 07-15-2011, 11:42 AM

Quote:

Originally Posted by TheStupid

In a HPC cluster, would all my programs, such as vmware, run faster or just the ones written specifically for clustering?

I had a similar question and the answer seems to be...

1, There was a project called OpenMOSIX that was intended to allow multi machine processing without requiring the software to be cluster aware which is now dead.
2, There is still a proprietary project (can't remember the name) that does the above... but... It seems it can't work with multi-threaded applications, don't ask me why!
3, The programs must be cluster aware by using special calls and signal passing to allow distributed (clustered) processing.

From what I've read, it's easier to have multiple processors on die (quads etc.) or on a single main board due to memory sharing/locking constraints.

In an HPC there seems to be no easy way to share and lock memory on multiple individual "computers" at the same time hence the software needs to be cut into chunks that do things uniquely (without requiring other shared data/locks) then passes the results back to a main program that amalgamates the data and gives a result.

jefro · 07-15-2011, 03:57 PM

There are a few ways to run a cluster. In all cases you can't simply consider this as a way to use in a home or soho setting.

What happens in general is a group of computers each work on a small part of a project. In that sense they are almost 3 times faster but overhead from breaking the project up and putting it back and network all take up speed. I'd guess more like 4 systems could act as 2.

This again only works for some special tasks. We do it by taking tasks and sending it to each computer in the cluster. We normally have two masters for failover and up to 50 or more remote nodes. Then another computer is used to pass the tasks between the nodes. It all ends up being that the task has 100ms to return to the server it's result. We might run 300,000 to a few million transactions in a normal run. Different systems we have run a bit differently. Remotes send a transaction to a group. The group allocates nodes based on first connected then sets three hosts per node to run transactions. When a host log's off a spot is then free on some node. Again the return of the data has to be very quick. In less than 100ms it has to be back at the host and in th hosts application so it can then direct a machine to do what is needed.

See also Sun (Oracle's) way to share work over any number or world wide workstations or servers.

See also things like folding@home

Tinkster · 07-17-2011, 04:16 AM

I'd like to point out that this thread had been dead for 4 years
before someone decided to unearth it for a thread-jacking. The
thread-jack has since been split out.

If you're trying to help the OP - I'm afraid he's long gone. If
you want to help the thread-jacker, move along to the link above.
Thank you for your co-operation, nothing much to see here.

Cheers,
Tink