Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have searched and read lots of document but i couldnt exactly I wonder does anyone made up a beowulf cluster espacialy on ubuntu 10.04 and how can i do?
I have searched and read lots of document but i couldnt exactly I wonder does anyone made up a beowulf cluster espacialy on ubuntu 10.04 and how can i do?
I have to say I have never heard of a beowulf cluster.
I am a intern on İTÜ CSCRS and we are working a cluster project and searching suitable type of cluster our old computers usualy has p4 processor and 512mb-1gb ram. I tried to made up beowulf cluster with virtual machines (ubuntu 10.04) but im newbie and i couldnt The problem is there are little document i found and they are not for ubuntu 10.04 so im here
I am a intern on İTÜ CSCRS and we are working a cluster project and searching suitable type of cluster our old computers usualy has p4 processor and 512mb-1gb ram. I tried to made up beowulf cluster with virtual machines (ubuntu 10.04) but im newbie and i couldnt The problem is there are little document i found and they are not for ubuntu 10.04 so im here
So let me get this straight, you want to use two or more computers to do a single task?
The definition off that site was very... vague.
I don't think its even possible to split a "single" threaded application over multiple computers using TCP/IP. Even if it was the speed of the network would greatly slow down the whole process to the point where it wouldn't really be beneficial.
Technically though if the program used "multiple" threads you could "technically" have the raw data of one thread sent to the CPU of another computer and the processed result sent back to the main PC.
But like I said before, the speed of the network and the added overhead would greatly slow down the process.
You would have to compile a program to redirect one thread of data to another PC, compile a program on the other PC's to receive and compute the incoming data and then sent it back, Then another program to execute the computed data.
Its feasible but.. not really worth the effort.
Think of it this way, Having two CPU cores doing the same thread wouldn't be beneficial (or even technically possible) because a third core would be required to split the binary and to handle the joining of the computed process.
But you also have to take in account that if the CPU architecture is different on the two or more PC's then this concept won't work.
I can write an example in C++ (Or Java) but if the program is already compiled (not open-source) then it would be impossible as the threads are protected.
But I am always up to a good challenge if you still want to go threw with the concept.
In fact I am almost positive that running a multi-threaded process over two computers via a network would be a LOT slower then just running the multi-threaded process on one computer.
If you have ever done network booting via PXE and seen how slow that is, you will understand how slow this concept will be.
Unless of course I got the definition wrong.
Last edited by David2010; 07-09-2010 at 09:06 PM.
Reason: Spell Checking, New insight
... running a multi-threaded process
over two computers via a network would be a LOT slower then just running
the multi-threaded process on one computer.
But the system you said is so expensive and my aim is using the old PCs in someway. And many companies prefer the cluster connected via ethernet card. May be less speed but using 1000b ethernet provide enough speed to share process.
Of course But the system you said is so expensive and my aim is using the old PCs in someway. And many companies prefer the cluster connected via ethernet card. May be less speed but using 1000b ethernet provide enough speed to share process.
So I am determined on a cluster type.
Well with that fast of a connection and assuming the distance between computers is small enough, I suppose there could be some benefits out of it.
But all OS's protect the threads of a program from another program so hijacking a programs thread via another program is out of the question.
Being as this is so, any already compiled program can not be executed in this Beowulf Cluster.
Luckily though, almost every linux program is open source.
But it would take an expert programmer to create a network connection, send the entire thread over the network, create another program to interpret and run the networked code, and then have that computed code redirected to the main computer to have the computed code run.
But quite honestly, although being an interesting concept, it would be nearly impossible.
For one, the sent C or C++ code over the network would have to be interpreted (by another program) and then executed.
Creating that interpretation program would take years to get the entire C or C++ library interpreted and even more years to interpret any additional third party libraries.
It would have to interpret the program because the data sent over the network would be one thread and the result would be uncompilable code.
So unless you have a whole group of very professional programmers then the task is not going to get done. I am just one professional programmer.
I am certified with computer programming in C++ and Java, and I am certified in computer hardware engineering.
David2010, I'd highly suggest doing a little bit of research before claiming that a Beowulf cluster is "nearly impossible" -- Beowulf clusters are heavily used for scientific and engineering applications (I should know, one of my jobs at the place I work is administering a Beowulf cluster). We have several different clusters consisting of several hundred nodes which are mostly devoted to running computational chemistry applications. It is true that you need specially designed code to work on a Beowulf (or alternatively a program that you run many, many separate copies of). Most programs use the MPI specification to send and receive messages between multiple process (yes, multiple processes, not threads, communicate over standard mechanisms i.e. shared memory or network sockets). You're also correct that the speed of the networking plays a large role in how well processes can pass messages. That's why it's important to have well designed code that does things like overlap communication and computation (so the processes can be doing useful work whilst waiting to receive messages). Many beowulfs use networking that's much faster and lower latency than gigabit Ethernet (InfiniBand has been widely popular but 10 GbE with iWarp is gaining popularity).
Despite these challenges, many research groups can and do use Beowulf clusters to get useful work done. However, they're not good for running general purpose office or gaming type applications; they need specially designed and written programs.
To the OP, you might want to take a look at the MicroWulf project for some information on building a small cluster. You might also find ClusterMonkey useful. It's possible to implement a Beowulf on pretty much any Linux distribution. The steps are pretty much to install your nodes and then install some mechanism of communication (usually some sort of MPI library). You can use something like Rocks, OSCAR, or Perceus tofacilitate imaging nodes or you can do it yourself for highest flexibility. Slap a batch scheduler on the front end if needed/desired and ta-da, you have a Beowulf. Whether you can get it to do anything useful is another story. What would you use your Beowulf cluster for?
David2010, I'd highly suggest doing a little bit of research before claiming that a Beowulf cluster is "nearly impossible" -- Beowulf clusters are heavily used for scientific and engineering applications (I should know, one of my jobs at the place I work is administering a Beowulf cluster). We have several different clusters consisting of several hundred nodes which are mostly devoted to running computational chemistry applications. It is true that you need specially designed code to work on a Beowulf (or alternatively a program that you run many, many separate copies of). Most programs use the MPI specification to send and receive messages between multiple process (yes, multiple processes, not threads, communicate over standard mechanisms i.e. shared memory or network sockets). You're also correct that the speed of the networking plays a large role in how well processes can pass messages. That's why it's important to have well designed code that does things like overlap communication and computation (so the processes can be doing useful work whilst waiting to receive messages). Many beowulfs use networking that's much faster and lower latency than gigabit Ethernet (InfiniBand has been widely popular but 10 GbE with iWarp is gaining popularity).
Despite these challenges, many research groups can and do use Beowulf clusters to get useful work done. However, they're not good for running general purpose office or gaming type applications; they need specially designed and written programs.
To the OP, you might want to take a look at the MicroWulf project for some information on building a small cluster. You might also find ClusterMonkey useful. It's possible to implement a Beowulf on pretty much any Linux distribution. The steps are pretty much to install your nodes and then install some mechanism of communication (usually some sort of MPI library). You can use something like Rocks, OSCAR, or Perceus tofacilitate imaging nodes or you can do it yourself for highest flexibility. Slap a batch scheduler on the front end if needed/desired and ta-da, you have a Beowulf. Whether you can get it to do anything useful is another story. What would you use your Beowulf cluster for?
Very interesting!
What I really meant was that it wasn't practical for general use. Sorry for any misunderstanding.
There was very little information on such a topic hence why I knew very little about it.
Hm... Multiple processes would definitely be easier than multiple threads but like we both stated, the Original Poster would have to make the code.
Only in very specific situations could this really be useful.
David2010, I completely agree with you that Beowulf clusters aren't really practical for general computing use. There are other clustering technologies (e.g. high availability using software that allows for load-balancing and auromatic failover, but these aren't really Beowulfs. There's also kernel patches like Mosix and Kerrighed that are designed to migrate processes between computers, but these are largely meant to distribute single-threaded processes between different machines; they don't really help if you have parallelized code.
There are some freely available programs designed to run over MPI on Beowulfs, GROMACS for molecular simulation and OpenFOAM for computational fluid dynamics being two I know of, I think there are also some freely-available weather/climate modeling codes, but I am not so familiar with these. However, these are highly technical scientific applications. Building a Beowulf is fun, but unless you have some specific application and have code that can run on a cluster (or are willing to write, debug, and tune such code yourself) all you'll get for your troubles is a cool learning experience and a substantially higher power bill :-).
Interestingly, many scientific codes are moving towards running on GPUs as well as CPUs to take advantage of additional processing resources. The planned Japanese TSUBAME computer is an example of a cluster that will contain GPUs and CPUs. This isn't totally a new idea, though, since the Roadrunner supercomputer at Los Alamos National Laboratory (the first machine to break the petaflop barrier) was a hybrid cluster containing both Opteron and Cell processors.
I had built a beowulf cluster on 2 Ubuntu systems running 9.10 and 9.04 just out of curiosity. So it might be possible to build it on 10.04 as well.
Forgive me for my poor grammar in the rest of the message--these are the notes that I took in a hurry after building my first cluster. They contain a lot of links, so treat this as a cookbook style guide
04. Creating the MPD Ring (?)
First Node: mpd --daemon --ncpus=2 --ifhn=192.168.0.86
use "mpdtrace -l" to find out the port
Subsequent Nodes: mpd --daemon --host=<hostaddr of first node> --port=<portnum of first node> --ncpus=2 --ifhn=192.168.0.86
EASY
06. rsync to synchronize files across systems
Synchronize files into the same directory on all nodes
SRC: http://www.shodor.org/cserd/Resource...ls/RunningMPI/
rsync -arv /home/<username>/code/mpi/ <username>@192.168.0.18:/home/<username>/code
Thanks btmiller for giving information. Your message is very helpfull. You told about using both GPU and CPU on a system.Actually It is final aim of my group. Some of my friends researching and working on GPU Programing and the other working cluster. In the next step we want o combine these two projects. İf you know some documents abut that issue can you share?
Quote:
Originally Posted by wagaboy
I had built a beowulf cluster on 2 Ubuntu systems running 9.10 and 9.04 just out of curiosity. So it might be possible to build it on 10.04 as well.
.......
06. rsync to synchronize files across systems
Synchronize files into the same directory on all nodes
SRC: http://www.shodor.org/cserd/Resource...ls/RunningMPI/
rsync -arv /home/<username>/code/mpi/ n<username>@192.168.0.18:/home/<username>/code
Wagaboy, I thanks a lot too.
I hope I can success. But I am afraid about changing about file path and name. I am in trouble about that. Thanks a lot again.
I'm afraid I'm not too much of an expert in GPU programming (I've only dabbled in it lightly) but you might want to poke around on NVIDIA's CUDA web site. I remember that they had a number of how-tos. If you're writing totally new code you might also want to look at OpenCL which is a cross-platform development environment for multicore CPUs, GPUs, and other accelerator devices.
I am currently building a ubuntu 10.04 beowulf, consisting of 24 quad core AMD nodes for our research group.
To start I would look at debian clusters:
debianclusters org
Almost everything there is applicable to ubuntu as well. Though I don't bother with DNS - just set up the hostnames of the nodes in /etc/hosts.
Also openmpi is much easier to use and configure - just install the packages openmpi-dev openmpi-bin and away you go (once shared home and passwordless login is set up):
mpirun -np 4 --host node14 hostname
Finally sun grid engine is an excellent queueing system to use and is also available in the repository. I am currently battling with FAI to get the nodes installed, but thats another story...
We built 2 system, one of them is homogeneous and other is heterogeneous. We didnt use any automatic programs intallations because we want to learn how that system works.
Just mpi was installed for parallel computing. But ethernet connection's speed is not enough.
Do you have any idea about alternative connection for ex. PCI connection(we dont know how its work)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.