Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Well those are two mighty big topics. What sort of information are you after? Applications? Type of servers? Network setup? Cluster Programming? Monitoring? Air conditioning? Power? Administration? Linux distribution? Price?
Are you thinking of many computers working on many separate things at once (like a render farm or enterprise web-hosting) or many computers working on the same thing at once (like a baewolf (sp?) cluster)?
1. Many computers working on the same thing at once. Ie computing one thing very quickly.
2. Parallel computing, many separate things at once, queuing systems etc
I'd like to know if people out there are using HPC, what for and how they are doing this.
1. Many computers working on the same thing at once. Ie computing one thing very quickly.
2. Parallel computing, many separate things at once, queuing systems etc
I'd like to know if people out there are using HPC, what for and how they are doing this.
Not to be a pain, but what sort of info are you after other than "Parellel computing"?
FWIW, I manage a ~400 proc render farm where we run many different things at once with a queuing system, then assemble the results as a post processes. (That's a very simplified explanation, but you get the idea).
We have a file serving system that consists of 6 separate servers that are auto-mounted by each client as needed, however, to the user, it looks like one, giant file system (until they do a df & see up to 6 nfs mounts).
There are about 100 workstations on this network as well - each of which has a person sitting in front of it.
On the system side, we use almost all free software & assemble most of the equipment ourselves - we don't typically buy pre-built solutions like BluArc or Isilon.
Now, with that said, I may be able to help you answer some questions, but you have to ask the questions before I know how to answer them.
If I can help out with the questions - something many people don't think about with this sort of setup - AC and power... you need lots of both. Many building can't handle it. You may need to get special permission from your power company to do it.
You may need to change your fire system (water sprinklers are not very good to computers).
You need a pretty stout network backplane.
Your queuing system needs to be bullet proof & handle errors properly.
You need to be sure all machines are in sync as closely as possible - time, packages, etc.
You need a good centrally managed authorization mechanism.
Your file servers will need some sort of very high speed throughput - usually done with a bond.
You need lots of fault tolerance in some places (file servers, switches) but not in others (computing nodes).
You may need to change your fire system (water sprinklers are not very good to computers).
I just love the smell of halons in the morning. I always marveled that halon systems were being replaced with compressed CO2 systems and no smelly chemicals were being mixed in for the benefit of humans detecting the extent of the invisible cloud. I've always been paranoid about working in those rooms because I have no idea how many seconds the alarm is set to sound for before the room is flooded; incidents like the one on board that Russian sub in which about 20 people were killed by the extinguisher system don't make me feel any safer.
Take it from me, when the gas drops, you need *NO* further incentive to exit the room.
Funnily enough, water is probably the best thing. Doesn't leave any residue, (relatively) easy to clean up, doesn't asphyxiate the carbon-based life forms that happen to be meandering about.
Does pay to (totally) cut the power feed prior to dumping the water tho' ...
fwiw, related to fire, at my last company we installed a fire system made for boat engine rooms. It was a halon-like powdery substance that was allegedly not *that* harmful to humans. I think the company was called "Fireboy" or something like that. Wasn't even that expensive - I think $2K for the system that easily protected our 9 x 14 x 14 machine room.
There quite possible were some legal issues with using such a system in a building.
BrianK, may I ask what Queuing system you are using to achieve this? Something open source?
Assume that power, ac, network and consistent systems, centralized authentication etc are all ok.
I'm primarily interested in the ways in which people are doing large scale computing to try to apply some of those techniques/software where I am.
We're using a proprietary queuing system that is all python based.
Previously, I've used the free, open source "Dr. Queue" http://www.drqueue.org/ & had pretty good luck with it. Dr Queue is set up for visual effects, but it has a "general type" job which allows you to run any command on the command line (this can turn into multiple command by simply including '&&' or ';', i.e. "uname -a && cd /foo/bar && myscript -o option1 -d option2"). It watches the process & reports errors correctly. It also has the notion of groups and priorities and does a good job with logs & can be setup to mail the user when their job is complete). Last I checked (a couple years ago, now), they were having some trouble with their web interface, but the GTK interface works just fine & runs on Linux, OSX, & Windows.
A friend of mine is the author of "Rush" http://seriss.com/rush/ which I've used at some other facilities. It is very good, very stable, with very good machine monitoring built in & their support is fantastic. You have to pay, but it's pretty inexpensive. You can try it out for free - see the "Sales" section of the website (Feel free to mention my name as a referral if you'd like). Rush is very open despite it's closed source inner module that handles the licensing. Beyond that, it's very scriptable.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.