clustering?

antoniemail · 06-29-2010, 05:34 PM

Hello,

I have 2 servers right now that use rsync to replicate files from one to the other. I am looking into clustering them now, for HA and redundancy. But does clustering mean there will be replication as well? Or will I still need to keep the rsync going?

I am running fedora servers right now. Any help would be appreciated. Thanks.

Blue_Ice · 06-30-2010, 02:33 AM

That will depend on your cluster setup. If it is really only HA you are looking for, then you would only need heartbeat and drbd for example. In case you want a HPC, then you should also think about a director and cluster filesystem (like OCFS2, GFS, glusterfs, etc.).

But other configurations might be required, depending on the applications you run. Reading your post, you are probably talking about webservers and in that case the above solution will do the trick. When it comes to e.g. MySQL, you probably want to use replication functionality in MySQL itself in combination with heartbeat.

czezz · 06-30-2010, 04:04 AM

Hi, I will attach my question to this topic:
let's say I have 2 identical machine. Can I create/make a cluster which mainly increase CPU performance ?
(I would use RHEL 5.x distro if this any matter)

Blue_Ice · 06-30-2010, 04:18 AM

Quote:

Originally Posted by czezz

Hi, I will attach my question to this topic:
let's say I have 2 identical machine. Can I create/make a cluster which mainly increase CPU performance ?
(I would use RHEL 5.x distro if this any matter)

What do you mean with increase CPU performance in this context? When you are talking about HPC (load balancing), you probably get what you want.
HPC devides the the load over multiple nodes in the cluster. The resources on the nodes will be used less, which causes an increase in the performance. Normally you build a HA cluster for the directors and behind those you have the cluster with the nodes that do the actual work. But it is also possible to put the director on the nodes itself.

czezz · 06-30-2010, 04:40 AM

Well, it sounds like HPC (load balancing) is this what I am thinking about.
Current machine does not manage sometimes with tasks and this is related to CPU usage. I am thinking to convince my boss to buy another identical machine and make a cluster (HPC).

I am thinking now about test on small regular PC:
- generate task which load 100% of CPU on a single PC.
- make cluster HPC on 2x regular PC's and generate this same task as on a single machine - to see what is performance/time of executing.

Is that make sense ?

Blue_Ice · 06-30-2010, 05:25 AM

Quote:

Originally Posted by czezz

Well, it sounds like HPC (load balancing) is this what I am thinking about.
Current machine does not manage sometimes with tasks and this is related to CPU usage. I am thinking to convince my boss to buy another identical machine and make a cluster (HPC).

I am thinking now about test on small regular PC:
- generate task which load 100% of CPU on a single PC.
- make cluster HPC on 2x regular PC's and generate this same task as on a single machine - to see what is performance/time of executing.

Is that make sense ?

Well, not really... Load balancing is only usefull, when e.g. 100 clients connect to one server and max out your resources. But when you have 2 servers in load balancing configuration, the 100 clients will be devided over the 2 servers. This will make sure that your resources will not experience such a heavy load anymore. What you are looking for is, deviding processes over several servers (grid computing). Unfortunately, I have no experience with grids. I think your software should be designed for it as well. An example of grid computing is the SETI project to find extraterrestrial life...

Blue_Ice · 06-30-2010, 05:27 AM

By the way if one process takes all the CPU time, you will definately have an issue with the hardware that is not suitable for the task. One process cannot be split up and handled by several computers.

antoniemail · 06-30-2010, 07:43 AM

Thanks Blue Ice. I do have mysql replication right now on my webservers. So that with rsync, heartbeat and drbd I should be all set with a pretty good cluster? Will I need a fence or something to monitor the cluster or will the machines monitor each other?

Blue_Ice · 06-30-2010, 08:08 AM

Quote:

Originally Posted by antoniemail

Thanks Blue Ice. I do have mysql replication right now on my webservers. So that with rsync, heartbeat and drbd I should be all set with a pretty good cluster? Will I need a fence or something to monitor the cluster or will the machines monitor each other?

Heartbeat checks if the service on one server is running and if not it will start the service on the other server.
Drbd is used for realtime replication of the hdd's. Drbd in combination with a cluster file system will give you the ability to build a master/master active/active cluster. This means that if you make a change on one system, it will be automatically replicated to the other server. This is a great solution for fileservers and for the files of websites. Again this is not wise for applications like MySQL and OpenLDAP as they have their own systems to replicate data.
For load balancing you need a director (e.g.: ldirectord), this is used to devide the load between the configured servers.
When set up correctly, you can easily add more servers for better availability and performance.

It is always good to monitor your servers. After all you want to know when one of your nodes goes down. This is not very complicated as all nodes will have their own ip addresses. Soa monitoring system like nagios will do the trick.
Take good care that you set up your nodes in the same way. Things like different firewall settings on nodes can make you wonder why your cluster is not working properly.

uten · 06-30-2010, 10:57 AM

I danger of pointing out the obvious:
To read up on the topic I would recommend cluster_(computing) and search for mosix, openmosix and linuxpmi at wikipedia. I would also recomend to experiment a bit with some of the live cd's specifically designed for clustering tasks to get an idea of what kind of design would benefit you most. And for the rest of us with just a general interest I think distcc, ccache and some monitoring tool is a nice introduction to the benefits and limitations of clustering.

Blue_Ice · 07-01-2010, 02:55 AM

Quote:

Originally Posted by uten

I danger of pointing out the obvious:
To read up on the topic I would recommend cluster_(computing) and search for mosix, openmosix and linuxpmi at wikipedia. I would also recomend to experiment a bit with some of the live cd's specifically designed for clustering tasks to get an idea of what kind of design would benefit you most. And for the rest of us with just a general interest I think distcc, ccache and some monitoring tool is a nice introduction to the benefits and limitations of clustering.

I wouldn't recommend OpenMosix as the project is closed since March 1, 2008. For mosix you have to pay, so that might be out of the question in some cases. I never heard of linuxpmi, so I am going to check that out.