LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-11-2013, 12:33 PM   #1
GhostRustam
LQ Newbie
 
Registered: May 2013
Posts: 1

Rep: Reputation: Disabled
problem with building computational cluster over infiniband fabrics


Hello!
Help me, please!
My task: I need to assemble computational cluster, consists of two nodes using infiniband technologie.
I have a Problem:
When i try to start "mpirun ... " for two nodes, the process hangs. If i try to use flag "-display-devel-map", it says that the daemons on other node not work.

Help me please!
There is configuration of my hard and software:
2 x nodes:
-Motherboard: Supermicro intel C602 chipset
-CPU: 2 x Intel Xeon E5-2620
-GPU: 4 x Nvidia Tesla M2090
-Ethernet: Intel gigabit 2 ports.
Infiniband:
-Infiniband: 2 x Mellanox QDR infinihost
- IB Switch: 1 x Mellanox switch 8-ports.
-Operating system: RHEL 6.1 (without cluster suite).

What i did:
1. Install clear RHEL 6.1 (without load balancing and infiniband support)
2. Install Mellanox OFED ( from oficial site ), for dependencies i install "tcl" and "tk".
3. Set up the Ethernets interfaces - ping and ssh succes!
4. Set up the Infiniband interfaces - ping and ssh succes!
5. Set up rsa keys for passwordless acces. Test for both nodes!
6. Install OpenMPI ver 1.4 and 1.6.
configure it by "--prefix=/usr/local/mpi/openmpi16|14 --with-threads --with-hwloc --with-openib --enable-mpi-thread-multiple --with-mxm=/opt/mellanox/mxm --with-mxm-libdir=/opt/mellanox/mxm/lib --with-fca=/opt/mellanox/fca --enable-heterogeneous --enable-openib-connectx-xrc"
But he says nothing to be done for this flags "--with-openib --with-mxm=/opt/mellanox/mxm --with-mxm-libdir=/opt/mellanox/mxm/lib --with-fca=/opt/mellanox/fca --enable-openib-connectx-xrc"
7. When i try to mpirun only on localhost - all works correctly, both over IB or ETH - the mpi says about ib-buffer memory.

8. When i use "mpirun -np 2 -hostfile nodes uname", where in "nodes" -names of both nodes, process hangs. i try to use hostnames and ip-addreses. This problem for both interfaces.

Sorry for my poor english.
Thanks in advance!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Building some sort of cluster: slurm, pacemaker, cluster-glue or .... kaz2100 Linux - Software 2 07-21-2011 12:04 AM
problem with infiniband card. pankajd Linux - Hardware 1 10-10-2009 04:51 AM
Mysql 4.1 cluster building Sheridan Linux - Server 1 01-10-2008 07:12 AM
LXer: Building Your First Cluster LXer Syndicated Linux News 0 04-15-2006 02:54 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 05:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration