LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   problem with building computational cluster over infiniband fabrics (https://www.linuxquestions.org/questions/linux-software-2/problem-with-building-computational-cluster-over-infiniband-fabrics-4175461560/)

GhostRustam 05-11-2013 12:33 PM

problem with building computational cluster over infiniband fabrics
 
Hello!
Help me, please!
My task: I need to assemble computational cluster, consists of two nodes using infiniband technologie.
I have a Problem:
When i try to start "mpirun ... " for two nodes, the process hangs. If i try to use flag "-display-devel-map", it says that the daemons on other node not work.

Help me please!
There is configuration of my hard and software:
2 x nodes:
-Motherboard: Supermicro intel C602 chipset
-CPU: 2 x Intel Xeon E5-2620
-GPU: 4 x Nvidia Tesla M2090
-Ethernet: Intel gigabit 2 ports.
Infiniband:
-Infiniband: 2 x Mellanox QDR infinihost
- IB Switch: 1 x Mellanox switch 8-ports.
-Operating system: RHEL 6.1 (without cluster suite).

What i did:
1. Install clear RHEL 6.1 (without load balancing and infiniband support)
2. Install Mellanox OFED ( from oficial site ), for dependencies i install "tcl" and "tk".
3. Set up the Ethernets interfaces - ping and ssh succes!
4. Set up the Infiniband interfaces - ping and ssh succes!
5. Set up rsa keys for passwordless acces. Test for both nodes!
6. Install OpenMPI ver 1.4 and 1.6.
configure it by "--prefix=/usr/local/mpi/openmpi16|14 --with-threads --with-hwloc --with-openib --enable-mpi-thread-multiple --with-mxm=/opt/mellanox/mxm --with-mxm-libdir=/opt/mellanox/mxm/lib --with-fca=/opt/mellanox/fca --enable-heterogeneous --enable-openib-connectx-xrc"
But he says nothing to be done for this flags "--with-openib --with-mxm=/opt/mellanox/mxm --with-mxm-libdir=/opt/mellanox/mxm/lib --with-fca=/opt/mellanox/fca --enable-openib-connectx-xrc"
7. When i try to mpirun only on localhost - all works correctly, both over IB or ETH - the mpi says about ib-buffer memory.

8. When i use "mpirun -np 2 -hostfile nodes uname", where in "nodes" -names of both nodes, process hangs. i try to use hostnames and ip-addreses. This problem for both interfaces.

Sorry for my poor english.
Thanks in advance!


All times are GMT -5. The time now is 06:02 PM.