LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 07-09-2010, 08:53 PM   #1
darkwolf
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Rep: Reputation: 1
Beowulf Cluster on Ubuntu 10.04 ?


I have searched and read lots of document but i couldnt exactly I wonder does anyone made up a beowulf cluster espacialy on ubuntu 10.04 and how can i do?
 
Old 07-09-2010, 09:01 PM   #2
David2010
Member
 
Registered: May 2009
Posts: 255

Rep: Reputation: 23
Quote:
Originally Posted by darkwolf View Post
I have searched and read lots of document but i couldnt exactly I wonder does anyone made up a beowulf cluster espacialy on ubuntu 10.04 and how can i do?
I have to say I have never heard of a beowulf cluster.

Where did you get this info?

Did it come from fsck?

Or was it an error during boot?
 
Old 07-09-2010, 09:13 PM   #3
darkwolf
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 1
The official site of Beowulf >>http://www.beowulf.org/

I am a intern on İTÜ CSCRS and we are working a cluster project and searching suitable type of cluster our old computers usualy has p4 processor and 512mb-1gb ram. I tried to made up beowulf cluster with virtual machines (ubuntu 10.04) but im newbie and i couldnt The problem is there are little document i found and they are not for ubuntu 10.04 so im here
 
Old 07-09-2010, 09:50 PM   #4
David2010
Member
 
Registered: May 2009
Posts: 255

Rep: Reputation: 23
Quote:
Originally Posted by darkwolf View Post
The official site of Beowulf >>http://www.beowulf.org/

I am a intern on İTÜ CSCRS and we are working a cluster project and searching suitable type of cluster our old computers usualy has p4 processor and 512mb-1gb ram. I tried to made up beowulf cluster with virtual machines (ubuntu 10.04) but im newbie and i couldnt The problem is there are little document i found and they are not for ubuntu 10.04 so im here
So let me get this straight, you want to use two or more computers to do a single task?

The definition off that site was very... vague.

I don't think its even possible to split a "single" threaded application over multiple computers using TCP/IP. Even if it was the speed of the network would greatly slow down the whole process to the point where it wouldn't really be beneficial.

Technically though if the program used "multiple" threads you could "technically" have the raw data of one thread sent to the CPU of another computer and the processed result sent back to the main PC.

But like I said before, the speed of the network and the added overhead would greatly slow down the process.

You would have to compile a program to redirect one thread of data to another PC, compile a program on the other PC's to receive and compute the incoming data and then sent it back, Then another program to execute the computed data.

Its feasible but.. not really worth the effort.

Think of it this way, Having two CPU cores doing the same thread wouldn't be beneficial (or even technically possible) because a third core would be required to split the binary and to handle the joining of the computed process.

But you also have to take in account that if the CPU architecture is different on the two or more PC's then this concept won't work.

I can write an example in C++ (Or Java) but if the program is already compiled (not open-source) then it would be impossible as the threads are protected.

But I am always up to a good challenge if you still want to go threw with the concept.

In fact I am almost positive that running a multi-threaded process over two computers via a network would be a LOT slower then just running the multi-threaded process on one computer.

If you have ever done network booting via PXE and seen how slow that is, you will understand how slow this concept will be.

Unless of course I got the definition wrong.

Last edited by David2010; 07-09-2010 at 10:06 PM. Reason: Spell Checking, New insight
 
Old 07-10-2010, 05:59 AM   #5
darkwolf
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 1
First of all, thanks for answer David.

Of course
Quote:
Originally Posted by David2010 View Post
... running a multi-threaded process
over two computers via a network would be a LOT slower then just running
the multi-threaded process on one computer.
But the system you said is so expensive and my aim is using the old PCs in someway. And many companies prefer the cluster connected via ethernet card. May be less speed but using 1000b ethernet provide enough speed to share process.

So I am determined on a cluster type.
 
Old 07-10-2010, 02:27 PM   #6
David2010
Member
 
Registered: May 2009
Posts: 255

Rep: Reputation: 23
Quote:
Originally Posted by darkwolf View Post
First of all, thanks for answer David.

Of course But the system you said is so expensive and my aim is using the old PCs in someway. And many companies prefer the cluster connected via ethernet card. May be less speed but using 1000b ethernet provide enough speed to share process.

So I am determined on a cluster type.
Well with that fast of a connection and assuming the distance between computers is small enough, I suppose there could be some benefits out of it.

But all OS's protect the threads of a program from another program so hijacking a programs thread via another program is out of the question.

Being as this is so, any already compiled program can not be executed in this Beowulf Cluster.

Luckily though, almost every linux program is open source.

But it would take an expert programmer to create a network connection, send the entire thread over the network, create another program to interpret and run the networked code, and then have that computed code redirected to the main computer to have the computed code run.

But quite honestly, although being an interesting concept, it would be nearly impossible.

For one, the sent C or C++ code over the network would have to be interpreted (by another program) and then executed.

Creating that interpretation program would take years to get the entire C or C++ library interpreted and even more years to interpret any additional third party libraries.

It would have to interpret the program because the data sent over the network would be one thread and the result would be uncompilable code.

So unless you have a whole group of very professional programmers then the task is not going to get done. I am just one professional programmer.

I am certified with computer programming in C++ and Java, and I am certified in computer hardware engineering.

Last edited by David2010; 07-10-2010 at 02:45 PM.
 
Old 07-10-2010, 03:27 PM   #7
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,158

Rep: Reputation: 328Reputation: 328Reputation: 328Reputation: 328
David2010, I'd highly suggest doing a little bit of research before claiming that a Beowulf cluster is "nearly impossible" -- Beowulf clusters are heavily used for scientific and engineering applications (I should know, one of my jobs at the place I work is administering a Beowulf cluster). We have several different clusters consisting of several hundred nodes which are mostly devoted to running computational chemistry applications. It is true that you need specially designed code to work on a Beowulf (or alternatively a program that you run many, many separate copies of). Most programs use the MPI specification to send and receive messages between multiple process (yes, multiple processes, not threads, communicate over standard mechanisms i.e. shared memory or network sockets). You're also correct that the speed of the networking plays a large role in how well processes can pass messages. That's why it's important to have well designed code that does things like overlap communication and computation (so the processes can be doing useful work whilst waiting to receive messages). Many beowulfs use networking that's much faster and lower latency than gigabit Ethernet (InfiniBand has been widely popular but 10 GbE with iWarp is gaining popularity).

Despite these challenges, many research groups can and do use Beowulf clusters to get useful work done. However, they're not good for running general purpose office or gaming type applications; they need specially designed and written programs.

To the OP, you might want to take a look at the MicroWulf project for some information on building a small cluster. You might also find ClusterMonkey useful. It's possible to implement a Beowulf on pretty much any Linux distribution. The steps are pretty much to install your nodes and then install some mechanism of communication (usually some sort of MPI library). You can use something like Rocks, OSCAR, or Perceus tofacilitate imaging nodes or you can do it yourself for highest flexibility. Slap a batch scheduler on the front end if needed/desired and ta-da, you have a Beowulf. Whether you can get it to do anything useful is another story. What would you use your Beowulf cluster for?
 
Old 07-10-2010, 05:54 PM   #8
David2010
Member
 
Registered: May 2009
Posts: 255

Rep: Reputation: 23
Quote:
Originally Posted by btmiller View Post
David2010, I'd highly suggest doing a little bit of research before claiming that a Beowulf cluster is "nearly impossible" -- Beowulf clusters are heavily used for scientific and engineering applications (I should know, one of my jobs at the place I work is administering a Beowulf cluster). We have several different clusters consisting of several hundred nodes which are mostly devoted to running computational chemistry applications. It is true that you need specially designed code to work on a Beowulf (or alternatively a program that you run many, many separate copies of). Most programs use the MPI specification to send and receive messages between multiple process (yes, multiple processes, not threads, communicate over standard mechanisms i.e. shared memory or network sockets). You're also correct that the speed of the networking plays a large role in how well processes can pass messages. That's why it's important to have well designed code that does things like overlap communication and computation (so the processes can be doing useful work whilst waiting to receive messages). Many beowulfs use networking that's much faster and lower latency than gigabit Ethernet (InfiniBand has been widely popular but 10 GbE with iWarp is gaining popularity).

Despite these challenges, many research groups can and do use Beowulf clusters to get useful work done. However, they're not good for running general purpose office or gaming type applications; they need specially designed and written programs.

To the OP, you might want to take a look at the MicroWulf project for some information on building a small cluster. You might also find ClusterMonkey useful. It's possible to implement a Beowulf on pretty much any Linux distribution. The steps are pretty much to install your nodes and then install some mechanism of communication (usually some sort of MPI library). You can use something like Rocks, OSCAR, or Perceus tofacilitate imaging nodes or you can do it yourself for highest flexibility. Slap a batch scheduler on the front end if needed/desired and ta-da, you have a Beowulf. Whether you can get it to do anything useful is another story. What would you use your Beowulf cluster for?
Very interesting!

What I really meant was that it wasn't practical for general use. Sorry for any misunderstanding.

There was very little information on such a topic hence why I knew very little about it.

Hm... Multiple processes would definitely be easier than multiple threads but like we both stated, the Original Poster would have to make the code.

Only in very specific situations could this really be useful.

Very interesting never the less.
 
Old 07-10-2010, 06:33 PM   #9
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,158

Rep: Reputation: 328Reputation: 328Reputation: 328Reputation: 328
David2010, I completely agree with you that Beowulf clusters aren't really practical for general computing use. There are other clustering technologies (e.g. high availability using software that allows for load-balancing and auromatic failover, but these aren't really Beowulfs. There's also kernel patches like Mosix and Kerrighed that are designed to migrate processes between computers, but these are largely meant to distribute single-threaded processes between different machines; they don't really help if you have parallelized code.

There are some freely available programs designed to run over MPI on Beowulfs, GROMACS for molecular simulation and OpenFOAM for computational fluid dynamics being two I know of, I think there are also some freely-available weather/climate modeling codes, but I am not so familiar with these. However, these are highly technical scientific applications. Building a Beowulf is fun, but unless you have some specific application and have code that can run on a cluster (or are willing to write, debug, and tune such code yourself) all you'll get for your troubles is a cool learning experience and a substantially higher power bill :-).

Interestingly, many scientific codes are moving towards running on GPUs as well as CPUs to take advantage of additional processing resources. The planned Japanese TSUBAME computer is an example of a cluster that will contain GPUs and CPUs. This isn't totally a new idea, though, since the Roadrunner supercomputer at Los Alamos National Laboratory (the first machine to break the petaflop barrier) was a hybrid cluster containing both Opteron and Cell processors.
 
Old 07-10-2010, 07:53 PM   #10
wagaboy
Member
 
Registered: Jun 2010
Distribution: Ubuntu 10.04, Cent OS 5.5, CLE3
Posts: 51

Rep: Reputation: 21
I had built a beowulf cluster on 2 Ubuntu systems running 9.10 and 9.04 just out of curiosity. So it might be possible to build it on 10.04 as well.

Forgive me for my poor grammar in the rest of the message--these are the notes that I took in a hurry after building my first cluster. They contain a lot of links, so treat this as a cookbook style guide

Notes:
Building my first cluster

Rough notes:
------------
1. MPD Configuration http://wiki.lazarus.freepascal.org/MPICH#Configuration
2. MPI Tutorial http://www.linux-mag.com/id/5759


Installation required:
----------------------
1. MPICH2
2.

Starting the first node
mpd --daemon --ncpus=2 --ifhn=192.168.0.86

Steps to build a Linux cluster:
------------------------------
01. Install MPICH2
Available from Ubuntu repositories and .deb files from ANL
EASY

02. MPI in 30 mins, Linux Mag tutorials
http://www.linux-mag.com/id/5759
MEDIUM

03. Creating MPD config files
MPD_SECRET_WORD=yours3cr3tw0rd
MPD_USE_ROOT_MPD=yes <------- No need for this. Makes root incharge which is not needed.
SRC: http://debianclusters.cs.uni.edu/ind..._Functionality
http://wiki.lazarus.freepascal.org/MPICH#Configuration
MEDIUM - HARD

04. Creating the MPD Ring (?)
First Node: mpd --daemon --ncpus=2 --ifhn=192.168.0.86
use "mpdtrace -l" to find out the port
Subsequent Nodes: mpd --daemon --host=<hostaddr of first node> --port=<portnum of first node> --ncpus=2 --ifhn=192.168.0.86
EASY

05. pwdless ssh
Required for MPI to ssh into remote systems w/o pwd
http://www.cs.wustl.edu/~mdeters/how-to/ssh/


06. rsync to synchronize files across systems
Synchronize files into the same directory on all nodes
SRC: http://www.shodor.org/cserd/Resource...ls/RunningMPI/
rsync -arv /home/<username>/code/mpi/ <username>@192.168.0.18:/home/<username>/code
 
Old 07-11-2010, 07:21 AM   #11
darkwolf
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 1
Thanks btmiller for giving information. Your message is very helpfull. You told about using both GPU and CPU on a system.Actually It is final aim of my group. Some of my friends researching and working on GPU Programing and the other working cluster. In the next step we want o combine these two projects. İf you know some documents abut that issue can you share?

Quote:
Originally Posted by wagaboy View Post
I had built a beowulf cluster on 2 Ubuntu systems running 9.10 and 9.04 just out of curiosity. So it might be possible to build it on 10.04 as well.

.......


06. rsync to synchronize files across systems
Synchronize files into the same directory on all nodes
SRC: http://www.shodor.org/cserd/Resource...ls/RunningMPI/
rsync -arv /home/<username>/code/mpi/ n<username>@192.168.0.18:/home/<username>/code
Wagaboy, I thanks a lot too.

I hope I can success. But I am afraid about changing about file path and name. I am in trouble about that. Thanks a lot again.
 
Old 07-11-2010, 02:56 PM   #12
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,158

Rep: Reputation: 328Reputation: 328Reputation: 328Reputation: 328
I'm afraid I'm not too much of an expert in GPU programming (I've only dabbled in it lightly) but you might want to poke around on NVIDIA's CUDA web site. I remember that they had a number of how-tos. If you're writing totally new code you might also want to look at OpenCL which is a cross-platform development environment for multicore CPUs, GPUs, and other accelerator devices.
 
Old 08-06-2010, 05:30 AM   #13
rfle500
LQ Newbie
 
Registered: Aug 2010
Posts: 1

Rep: Reputation: 0
I am currently building a ubuntu 10.04 beowulf, consisting of 24 quad core AMD nodes for our research group.

To start I would look at debian clusters:

debianclusters org

Almost everything there is applicable to ubuntu as well. Though I don't bother with DNS - just set up the hostnames of the nodes in /etc/hosts.

Also openmpi is much easier to use and configure - just install the packages openmpi-dev openmpi-bin and away you go (once shared home and passwordless login is set up):

mpirun -np 4 --host node14 hostname

Finally sun grid engine is an excellent queueing system to use and is also available in the repository. I am currently battling with FAI to get the nodes installed, but thats another story...
 
Old 10-01-2010, 03:59 AM   #14
Mrfai
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Rep: Reputation: 0
FAI for Beowulf cluster

I installed several Beowulf clusters using FAI (Fully Automatic Installation).

It's home page is:
http://fai-project.org

The FAI guide also contains a chapter about building a beowulf cluster. unfortunately this chapter is a bit outdated.

http://fai-project.org/fai-guide/ar01s08.html

Since FAI also works with Ubuntu, you should give it a try. If you have any problems with FAI, join the IRC channel and you will get help from us.
 
Old 10-02-2010, 07:25 AM   #15
darkwolf
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 1
We built 2 system, one of them is homogeneous and other is heterogeneous. We didnt use any automatic programs intallations because we want to learn how that system works.

Just mpi was installed for parallel computing. But ethernet connection's speed is not enough.

Do you have any idea about alternative connection for ex. PCI connection(we dont know how its work)
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Beowulf-like cluster luigi Linux - General 5 11-18-2009 09:30 AM
cluster beowulf grened23 Linux - Networking 0 09-11-2006 09:13 AM
Beowulf cluster questions etmiserie Linux - General 3 06-29-2004 10:07 PM
RH Beowulf Cluster and SETI kajensen Red Hat 1 09-06-2003 08:23 AM
Beowulf Cluster or Network CCParrish Linux - General 3 12-24-2000 11:22 AM


All times are GMT -5. The time now is 08:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration