LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
LinkBack Search this Thread
Old 11-11-2008, 12:10 PM   #1
humbletech99
Member
 
Registered: Jun 2005
Posts: 374

Rep: Reputation: 30
High Performance Computing


I am interested in setting up some High Performance Computing clusters and would like to get people's views and experiences on this.

I have 2 requirements:

1. Compute clusters to do fast cpu intensive computations
2. Storage clusters of parallel and extendable filesystems spread across many nodes

Both of these should run across multiple commodity hardware nodes and ideally be Linux/Unix based and open source.

Any feedback welcome.
 
Old 11-11-2008, 05:20 PM   #2
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Rep: Reputation: 51
Well those are two mighty big topics. What sort of information are you after? Applications? Type of servers? Network setup? Cluster Programming? Monitoring? Air conditioning? Power? Administration? Linux distribution? Price?

Are you thinking of many computers working on many separate things at once (like a render farm or enterprise web-hosting) or many computers working on the same thing at once (like a baewolf (sp?) cluster)?
 
Old 11-12-2008, 03:55 AM   #3
humbletech99
Member
 
Registered: Jun 2005
Posts: 374

Original Poster
Rep: Reputation: 30
I am interested in

1. Many computers working on the same thing at once. Ie computing one thing very quickly.
2. Parallel computing, many separate things at once, queuing systems etc

I'd like to know if people out there are using HPC, what for and how they are doing this.
 
Old 11-12-2008, 04:23 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 10,448

Rep: Reputation: 622Reputation: 622Reputation: 622Reputation: 622Reputation: 622Reputation: 622
Maybe start looking with lse
 
Old 11-12-2008, 01:47 PM   #5
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Rep: Reputation: 51
Quote:
Originally Posted by humbletech99 View Post
I am interested in

1. Many computers working on the same thing at once. Ie computing one thing very quickly.
2. Parallel computing, many separate things at once, queuing systems etc

I'd like to know if people out there are using HPC, what for and how they are doing this.
Not to be a pain, but what sort of info are you after other than "Parellel computing"?

FWIW, I manage a ~400 proc render farm where we run many different things at once with a queuing system, then assemble the results as a post processes. (That's a very simplified explanation, but you get the idea).

We have a file serving system that consists of 6 separate servers that are auto-mounted by each client as needed, however, to the user, it looks like one, giant file system (until they do a df & see up to 6 nfs mounts).

There are about 100 workstations on this network as well - each of which has a person sitting in front of it.

On the system side, we use almost all free software & assemble most of the equipment ourselves - we don't typically buy pre-built solutions like BluArc or Isilon.

Now, with that said, I may be able to help you answer some questions, but you have to ask the questions before I know how to answer them.

If I can help out with the questions - something many people don't think about with this sort of setup - AC and power... you need lots of both. Many building can't handle it. You may need to get special permission from your power company to do it.
You may need to change your fire system (water sprinklers are not very good to computers).
You need a pretty stout network backplane.
Your queuing system needs to be bullet proof & handle errors properly.
You need to be sure all machines are in sync as closely as possible - time, packages, etc.
You need a good centrally managed authorization mechanism.
Your file servers will need some sort of very high speed throughput - usually done with a bond.
You need lots of fault tolerance in some places (file servers, switches) but not in others (computing nodes).
 
Old 11-12-2008, 03:20 PM   #6
pinniped
Senior Member
 
Registered: May 2008
Location: planet earth
Distribution: Debian
Posts: 1,732

Rep: Reputation: 49
Quote:
Originally Posted by BrianK View Post
You may need to change your fire system (water sprinklers are not very good to computers).
I just love the smell of halons in the morning. I always marveled that halon systems were being replaced with compressed CO2 systems and no smelly chemicals were being mixed in for the benefit of humans detecting the extent of the invisible cloud. I've always been paranoid about working in those rooms because I have no idea how many seconds the alarm is set to sound for before the room is flooded; incidents like the one on board that Russian sub in which about 20 people were killed by the extinguisher system don't make me feel any safer.
 
Old 11-12-2008, 04:32 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 10,448

Rep: Reputation: 622Reputation: 622Reputation: 622Reputation: 622Reputation: 622Reputation: 622
Take it from me, when the gas drops, you need *NO* further incentive to exit the room.
Funnily enough, water is probably the best thing. Doesn't leave any residue, (relatively) easy to clean up, doesn't asphyxiate the carbon-based life forms that happen to be meandering about.

Does pay to (totally) cut the power feed prior to dumping the water tho' ...
 
Old 11-12-2008, 05:46 PM   #8
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Rep: Reputation: 51
fwiw, related to fire, at my last company we installed a fire system made for boat engine rooms. It was a halon-like powdery substance that was allegedly not *that* harmful to humans. I think the company was called "Fireboy" or something like that. Wasn't even that expensive - I think $2K for the system that easily protected our 9 x 14 x 14 machine room.

There quite possible were some legal issues with using such a system in a building.
 
Old 11-13-2008, 04:53 AM   #9
humbletech99
Member
 
Registered: Jun 2005
Posts: 374

Original Poster
Rep: Reputation: 30
BrianK, may I ask what Queuing system you are using to achieve this? Something open source?

Assume that power, ac, network and consistent systems, centralized authentication etc are all ok.

I'm primarily interested in the ways in which people are doing large scale computing to try to apply some of those techniques/software where I am.
 
Old 11-13-2008, 01:29 PM   #10
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Rep: Reputation: 51
Quote:
Originally Posted by humbletech99 View Post
BrianK, may I ask what Queuing system you are using to achieve this? Something open source?

Assume that power, ac, network and consistent systems, centralized authentication etc are all ok.

I'm primarily interested in the ways in which people are doing large scale computing to try to apply some of those techniques/software where I am.
We're using a proprietary queuing system that is all python based.

Previously, I've used the free, open source "Dr. Queue" http://www.drqueue.org/ & had pretty good luck with it. Dr Queue is set up for visual effects, but it has a "general type" job which allows you to run any command on the command line (this can turn into multiple command by simply including '&&' or ';', i.e. "uname -a && cd /foo/bar && myscript -o option1 -d option2"). It watches the process & reports errors correctly. It also has the notion of groups and priorities and does a good job with logs & can be setup to mail the user when their job is complete). Last I checked (a couple years ago, now), they were having some trouble with their web interface, but the GTK interface works just fine & runs on Linux, OSX, & Windows.

A friend of mine is the author of "Rush" http://seriss.com/rush/ which I've used at some other facilities. It is very good, very stable, with very good machine monitoring built in & their support is fantastic. You have to pay, but it's pretty inexpensive. You can try it out for free - see the "Sales" section of the website (Feel free to mention my name as a referral if you'd like). Rush is very open despite it's closed source inner module that handles the licensing. Beyond that, it's very scriptable.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] slackware & hight performance computing aihaike Linux - Server 2 04-07-2008 12:29 PM
LXer: Performance Technologies Announces Availability of AMC121 High-Performance Comp LXer Syndicated Linux News 0 09-18-2007 10:30 AM
LXer: Universities switch on high-power computing projects LXer Syndicated Linux News 0 02-06-2006 01:31 AM
high-performance laptop memonvil Linux - Laptop and Netbook 7 03-02-2005 07:41 AM
Computing Performance due to emulator mysticas Linux - Software 1 12-07-2004 01:32 PM


All times are GMT -5. The time now is 04:17 PM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration