-   Linux - Networking (
-   -   Setting up a cluster with old computers (for computation) (

Feynman 10-01-2011 12:35 PM

Setting up a cluster with old computers (for computation)
I know there are a few other threads on setting up clusters, but they all seem to end with:
"why do you want a cluster"
"something about a webserver/storage"
"go here [link]"

I do a good deal of quantum chemistry simulations (both for research and for fun) and running these programs with mpi is pretty standard. I have 3-4 old computers sitting around, so naturally, I would like to make myself a cluster. So here are some specifications:

*Preferably only 1 hard drive connected to the "master" (a new 64bit computer)
*Be able to use mpi to distribute the task to the other less powerful computers.
*Keep the setup as versatile as possible as to maximize the chance that I can add on other computers as I replace old ones.

**My old computers are 5-8 years old, so if there is truly no configuration that will work with my new 64bit computer, then so be it. However, I would much rather practice setting up an inefficient cluster with some old junk before buying some good hardware and setting up something more powerful.

I am certainly quite new to networking, but I assure you I have done my best to learn from other websites. Here are some things I have seen and the reason(s) I needed more information:

*"Beowulf cluster": The phrase everyone throws out that seems to miraculously answer all questions. What I find is a website that explains the philosophy and setup of the "Beowulf cluster" in layman terms, and an archive of threads that I cannot figure out how to search (and hence do not want to post on in fear of asking a question that has been asked before). What I did not find was a step by step guide to setting the thing up.

*"Set up your own Linux supercomputer!": What I find is several relatively good articles that guide you through setting up a linux cluster. The problem was these guides were often inconsistent with each other--so I cannot crosscheck my understanding.

Some of the question I have been left with are:

Do I need hard drives on all the computers?
If so, should I load Linux on them?
Should I use this etherboot/pxe thing?
If so do, should I use a .iso or a .rom?
Do I need a router?
Do I need some special cards?

Starting a thread on a forum was really a last resort, but I really need to know the essential hardware and general process of setting it up should be given my goals, and if at all possible, some details on how to go through with this process.

weibullguy 10-01-2011 01:47 PM

I did the same thing once just for shits and giggles using CLFS. I wrote all my notes into a "hint." It turned out more like a book. I'll have to find it and get it to you. I wouldn't classify myself as an expert, but here it goes...


Originally Posted by Feynman (Post 4487361)
Do I need hard drives on all the computers?

No. You could net boot everything and only have a hard drive in the "master" machine. The cluster I had setup had hard drives in all the machines because they were "old" and "broken" Windoze machines.

Originally Posted by Feynman (Post 4487361)
If so, should I load Linux on them?

Of course, I used Linux on all the nodes when I did it. I suppose you could use a heterogeneous mix of OSes, but that would certainly lead to additional headaches. My recommendation is to use the same OS, same distro, same version, same (as much as possible) everything on all the nodes. The exception would be the "master" node which is probably a machine you're using for day to day stuff too.

Originally Posted by Feynman (Post 4487361)
Should I use this etherboot/pxe thing?

You could.

Originally Posted by Feynman (Post 4487361)
If so do, should I use a .iso or a .rom?

I don't have an answer.

Originally Posted by Feynman (Post 4487361)
Do I need a router?

You need some way for all the machines to communicate. I used a four port router and an eight port ethernet switch. I also had eight "slave" nodes at the largest. The only reason I used the router is so I could connect to the internet. My "master" node was my day to day desktop. You could, for example, use two network cards in your "master" node. In this case eth0 could be connected to the router and the outside world and eth1 could be connected to the cluster. All the other "slave" nodes would only need one network card connected to the cluster network.

Originally Posted by Feynman (Post 4487361)
Do I need some special cards?

No. You need a motherboard and a network card. You don't even need a graphics card. Use SSH for everything.

The other thing you could fiddle with is grid computing. Think BOINC. You could set something like that up, but keep it localized to your little group of machines.

Feynman 10-01-2011 01:56 PM

Thanks, that helps get me on the right track. What you did definitely seems to be what I want to do right down to using the master node as my desktop (I am replying to this thread on it now.)
So for hardware:
One hard drive for the master
An extra network card for the master
Many-port ethernet switch to connect the master to all the slaves.
Many ethernet cables

I suppose the next step would be figuring out how to boot one of my slave computers via an ethernet cable going to my master.

I will keep searching the web, but further guidance would be appreciated!

Feynman 10-01-2011 02:04 PM

At this point, I am looking into FIA:

This seems like what I would want to use to configure my master and slave computers, but I might not be understanding correctly.

jefro 10-01-2011 02:09 PM

Might look at easy setups like knoppix cluster or rocks cluster. not a very complete list.

The program you use in more important to know about.

Consider a switch over a hub. The tend to allow about 70% traffic where a hub may only allow 30% speeds.

rustek 10-01-2011 02:12 PM

You probably don't really need the second nic card.
You can add another IP to your master using 192.168.x.x and put the slaves on 192.168.x.x and plug the router into the switch.


Feynman 10-01-2011 02:39 PM

I had looked into knoppix and rocks (mostly knoppix) but I still could not understand what was going on between everything. I am going on memory, it as been a while since I gave up on trying to understand how this worked.

Anyway, I got my slave computer requesting DHCP signal from my master computer when it boots up (and something about PXE) and FHI installed on the master. I had tried "installing" gpxe on my slave by burning the iso from the website onto a CD and loading the CD in the slave upon startup. I am not sure whether that made a difference in anything.

Also, I booted them both up at the same time. Should I have booted one of them first?

ButterflyMelissa 10-02-2011 02:59 AM


it as been a while since I gave up on trying to understand how this worked.
Dont, asking questions in order to understand is what makes us Humans... ;)

Okay, I've got a tutorial here to read. from a 10000 mile view it boils down to this: get a DHCP server up and running, your NIC's will find it and PXE-boot into action. That's what the NIC is programmed to do. It might even be a bonus (power-consumption wise for one) to unhook the drives (HD, CDrom, floppy) altogether, just so silence things down a bit...

You'll need to set up a PXE server, Arch for one has some pretty cool docs on this...though it's not just Arch, but others as well that boast PXE servers...

Of course, then you'll have a set of thin clients (ie diskless stations) active...what's next is the real adventure...

And, by the way, you've possibly looked at something on youtube about this... :D


PS - it's been a while since I ventured on "that side of the island" so my links will be somewhat dated...they come from my own library...

jefro 10-02-2011 01:57 PM

The most simple pxe setup I know of is almost any knoppix 3.3 to 6.x. Live boot to the cd/dvd and then start knoppix terminal server. That script sets up dhcp, tftp and installs default file for booting remotes and builds an image to send over.

You only need gpxe/ipxe if your current nic/bios doesn't support pxe boot or you want to boot over iscsi, http, ftp or other storage.

To see gpxe in action see or and I think fedora has a setup like that too. Dunno why more don't have such a system in place.

Feynman 10-09-2011 07:44 PM

Thanks for the replies, I have been a busy attending to other things last week, but I do not want to let this thread go dead. Is knoppix a patch for your main OS or an OS of its own? Can I still run the Ubuntu programs I wanted to run on my cluster, or do I have to reinstall everything for knoppix?

In the mean time, I was looking into that tutorial on howtoforge, and it referred to this as a prior setup. The pxe booting tutorial was quite comprehensible, but I am wondering whether it is necessary to do everything in the server setup tutorial--which looks much more like the nightmarish set of file modifications and downloads I have become accustomed to seeing in "easy clustering" tutorials.

Reuti 10-10-2011 11:45 AM

What types of programs you are referring to? Gaussian, GAMESS, Molpro, ... some which need huge local scratch space? Then you need a local disk anyway. Or more like VASP which need less to none? Using scratch files in the home directory is possible, but would slow down the headnode and computations a lot.

I would even say, that 3-4 machines you can install by hand instead over network (if you want to get known to PXE boot and DHCP, it’s fine of course). Having setup the nodes (I assume 32 bit ones), you might want a queuing system like GridEngine for submitting the jobs.

The thing you need is to have one machine (the head node) for NFS (/home and maybe /opt for the applications), NIS (for the accounts), and SGE. Then you can just mount /home on the nodes and get access to the user account, start the so called sgeexecd and you are done. Well, installing the applications on the headnode in /opt is left.

Feynman 10-10-2011 02:12 PM

I typed up a very nice long response, but I was timed out as I posted it. So for better or worse, here is a shorted version of it:

I use gamess type software but would prefer not to be limitted to it.
I will do local installation if I must, but I prefer the PXE booting because:
--It is scalable and therefore good to know how to do
--I would have to buy more hard drives

I run a 64 bit head node and the rest are 32 bits.

I have looked into queuing systems, Condor in particular, but I still do not fully understand how they work--particularly in relation to openmpi--which I am more familiar with.

At this point, I am still working on pxe booting. My client is ready, my master will not boot it. In short, I have followed at least 5 tutorials over the course of 2 weeks or so, and all I get is a network configuration so screwed up that I have to reinstall the OS (I have indeed reinstalled it at least 5 times.)

Does anyone have any example DHCP and interface configurations? What exactly do I have to set up after I install the OS? I am running Ubuntu 11.04. I have no init.d directory. I tried installing xinitd. That did not work.

Any help would be appreciated.

Feynman 10-10-2011 02:32 PM

I should probably mention:
On my head node, I have one ethernet slot for internet, and one for the cluster. Every time I mess with the interface and dhcp settings my internet starts timing out every 5-10 minutes and I have to reconnect. I just copy and past the stuff in the example dhcp and interface settings into my settings. Those files are usually either black or nonexistent on my computer. I get the feeling I should already have default files--especially considering all the tutorials say to back it up first.

Reuti 10-11-2011 06:34 AM

Condor is more for cycle stealing of workstations. It might work for you in this small environment too, but there is no real integration of an actual MPI implementation. There was one for a former version of MPICH AFAIK.

If it’s only you, you can also set up one local user on each machine (preferable with the same UID and leave out the NIS). So, after setting up NFS on the headnode and mount them on the nodes you can also run jobs by assembling a hostfile for MPI on your own. This way of course you have to take care not to oversubscribe nodes, which would be the duty of the queungsystem. This would also allow you to serialize the workflow.

You mean the nodes you set up mess up with the addresses or the headnode, which gets the address from another DHCP server?

All times are GMT -5. The time now is 02:46 PM.