Running a Genetic Algorithm on an Amazon EC2 Cluster

Posted 05-26-2012 at 04:22 PM by robertvi
Updated 05-27-2012 at 03:20 AM by robertvi

Running a Genetic Algorithm on an Amazon EC2 Cluster

I've started working on a simple tool to run genetic algorithms on a Linux cluster. I'm using MIT's StarCluster utility to create an on-demand Linux cluster on Amazon's EC2 cloud computing service. Using this as a starting point it should be pretty easy to deploy as much computing power as I want (budget permitting).

Step one was Amazon Web Services (http://aws.amazon.com/) sign up. I already had a normal Amazon account, so I used the same email and password during the signup. This requires a credit card, email account verification and a confirmation code sent to you by phone. It's a pay-as-you go service, with some basics available free for the first year, so I felt comfortable to sign up - no big commitment is required. (Naturally, if you deploy a massive cluster for days at a time, your credit card will be charged accordingly).

Next I installed StarCluster (http://web.mit.edu/star/cluster/) on my Linux laptop, using the recommended Python Package Index method (since python was already installed, this was literally just: "sudo easy_install StarCluster" from the terminal). I found the quick start guide to be pretty good, and the how-to video also very useful.

StarCluster needs to have your Amazon login credentials written into its config file in order to log into EC2 and run commands for you. After you've signed up to AWS Amazon creates most of the log in credentials for you, which you can find under My Account -> Security Credentials. You'll need the Access Key Id and Secret Access Key (in the Access Credentials section) and your Canonical User ID (under the Account Identifiers section).

You'll also need to create an Amazon EC2 Key Pair (not to be confused with an Amazon CloudFront Key Pair, in the same section), which can be done in the Access Credentials section. I ran a StarCluster command which tried to create a key pair for me and upload the public key to AWS, but after running the command AWS did not have the public key in its system. I therefore created a key pair on the AWS web interface, and downloaded the private key to my computer. For security purposes it's probably a good idea to store the private key and the starcluster config file with all your login details with the file access permissions set to allow only your user to read them (eg "chmod go-wx private-key-filename config-filename"). And do keep all of the login info safe, otherwise someone could log in and use a lot of expensive cloud services on your account.

Having worked through the StarCluster quick start guide I was able to create a small, two node cluster. Next I'll explain how I'm going to use a genetic algorithm on the cluster.

A genetic algorithm (GA) is an optimisation algorithm inspired by Darwinian evolution. My aim is to produce neural networks able to perform certain tasks. To do this using a GA we imagine a kind of artificial DNA, called a genotype, which is simply a list of numbers defining all the properties of a neural network. The details are not important for understanding how the cluster will function, so I will skip over most of it here. The GA begins with a population of multiple genotypes containing random numbers. For each genotype a neural network is created and simulated to see how well it solves the task, and a score, called the fitness, is assigned to the genotype. Once all genotypes have been given a fitness the GA produces new genotypes, by copying the genotypes with the highest fitness and in addition applying some small random changes to the numbers. To make room in the population for the new genotypes the worst genotypes are deleted. Therefore, over time, by trial and error, the GA discovers better and better solutions to the task.

A key feature of GAs is that each genotype can have its fitness evaluated independently, and therefore one way to parallelise a GA is to send fitness evaluation jobs out to each node in a cluster, and have the GA running on the master node managing the population as a whole. However, if it only takes a few seconds to evaluate each genotype and the time taken to assign each job to a node is also on the order of a few seconds, the cluster will waste a lot of time assigning and migrating jobs.

There is an alternative approach. Instead of having one population using the whole cluster, we can have a separate population running on each node, and occasionally migrate genotypes between nodes somehow, creating a meta-population. Then we can have all the nodes running evaluations virtually all of the time, with a small migration overhead.

To test the program on a small scale I'll start with one population running on my laptop. To deploy on the cluster I'll write a second, master program to launch populations on each of the nodes. To implement migration I'll simply have the populations quit after, say, ten minutes, and save the best few genotypes to a file. Whenever the master program detects that the queue of pending jobs is low it launches a new population, and seeds it with several genotypes from the most recently created files. As long as more than one file is given to each new population genotypes will be able to spread through the meta-population.

The aim is to have the basic evolver program not care whether it is running by itself, or on the cluster. It just runs until I hit Ctrl-C or until the time limit is reached, and then it saves the best genotypes to a file. StarCluster takes care of making this output available on the master node. The master program only has to know about passing the correct file names to the new jobs it creates - it doesn't know anything about GAs or neural networks etc, nor does it care what language the basic evolver program is written in. Hopefully it will be useful for a wide range of GA applications, and reasonably scalable. The StarCluster grid comes preconfigured with Sun Grid Engine, therefore it is simple to hand out jobs to the cluster nodes.

I will aim to post about my initial testing soon... Feel free to comment below if you have any interest or advice about this project.

Running a Genetic Algorithm on an Amazon EC2 Cluster

Comments