problem with building computational cluster over infiniband fabrics
Hello!
Help me, please! My task: I need to assemble computational cluster, consists of two nodes using infiniband technologie. I have a Problem: When i try to start "mpirun ... " for two nodes, the process hangs. If i try to use flag "-display-devel-map", it says that the daemons on other node not work. Help me please! There is configuration of my hard and software: 2 x nodes: -Motherboard: Supermicro intel C602 chipset -CPU: 2 x Intel Xeon E5-2620 -GPU: 4 x Nvidia Tesla M2090 -Ethernet: Intel gigabit 2 ports. Infiniband: -Infiniband: 2 x Mellanox QDR infinihost - IB Switch: 1 x Mellanox switch 8-ports. -Operating system: RHEL 6.1 (without cluster suite). What i did: 1. Install clear RHEL 6.1 (without load balancing and infiniband support) 2. Install Mellanox OFED ( from oficial site ), for dependencies i install "tcl" and "tk". 3. Set up the Ethernets interfaces - ping and ssh succes! 4. Set up the Infiniband interfaces - ping and ssh succes! 5. Set up rsa keys for passwordless acces. Test for both nodes! 6. Install OpenMPI ver 1.4 and 1.6. configure it by "--prefix=/usr/local/mpi/openmpi16|14 --with-threads --with-hwloc --with-openib --enable-mpi-thread-multiple --with-mxm=/opt/mellanox/mxm --with-mxm-libdir=/opt/mellanox/mxm/lib --with-fca=/opt/mellanox/fca --enable-heterogeneous --enable-openib-connectx-xrc" But he says nothing to be done for this flags "--with-openib --with-mxm=/opt/mellanox/mxm --with-mxm-libdir=/opt/mellanox/mxm/lib --with-fca=/opt/mellanox/fca --enable-openib-connectx-xrc" 7. When i try to mpirun only on localhost - all works correctly, both over IB or ETH - the mpi says about ib-buffer memory. 8. When i use "mpirun -np 2 -hostfile nodes uname", where in "nodes" -names of both nodes, process hangs. i try to use hostnames and ip-addreses. This problem for both interfaces. Sorry for my poor english. Thanks in advance! |
All times are GMT -5. The time now is 06:02 PM. |