LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   openmpi - initialization failed (https://www.linuxquestions.org/questions/programming-9/openmpi-initialization-failed-4175426234/)

jkobori 09-08-2012 07:38 AM

openmpi - initialization failed
 
Hi everybody,

my problem is the following:
I'm trying to run an openmpi program on a cluster (atlasz.elte.hu, it's in hungarian, but you can try google translate),
but I always got this error:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(394)...........: Initialization failed
MPID_Init(121)..................:
MPIDI_Populate_vc_node_ids(1219):
MPID_Get_max_node_id(798).......: PMI_KVS_Commit returned -1

I am doing it according to the manual, for example, here is my script:

#!/bin/bash

mpirun.openmpi /users/jkobori/boxfit/out/boxfit boxfitsettings.txt

which I am runnig with the sbatch -p hpc2009 -N 4 -B 2:4:1 s.sh line.
This is the way it should be done, but it doesn't work for me.
Can You help me out?

If any additional info is needed, please, tell me.

Regards,
Joe

Reuti 09-09-2012 12:47 PM

And how did you compile the application? With the precompiled Open MPI installation? The “MPIR_Init_thread” may come from another MPI version.

jkobori 09-09-2012 03:56 PM

I compile it with the

CXX = mpicxx

option in the makefile. On the machine there are OpenMPI 1.4.2 and MPICH2 1.2.1 installed among many other.
I have to compile it with the make clean boxfit line.

Reuti 09-09-2012 03:58 PM

They are all called this waym being it Open MPI or MPICH2. You can use:
Code:

$ which mpicxx
to get more information.

jkobori 09-09-2012 04:01 PM

The which mpicxx gives the following result:

/usr/local/bin/mpicxx

jkobori 09-09-2012 04:03 PM

Well, something's strange happening, because now I tried to run it again,
but after giving the error reported above it the boxfit calculates the results...

Reuti 09-09-2012 04:04 PM

Good, then we have to investigate more:
Code:

$ ls -lh /usr/local/bin/mpicxx

jkobori 09-09-2012 04:06 PM

ls -lh /usr/local/bin/mpicxx results in

-rwxr-xr-x 1 locadmin locals 8.1K Jun 21 2010 /usr/local/bin/mpicxx

Reuti 09-09-2012 04:12 PM

Woah, not a symbolic link as I thought. Maybe this shows more:
Code:

$ strings /usr/local/bin/mpicxx | less
and check the output for the strings like “openmpi” or “mpich”. The other way could be a:
Code:

$ ldd your_binary
to see to which libraries it is linked to.

jkobori 09-09-2012 04:30 PM

Before the ldd, I guess it's better to tell, that in the makefile I have this line:
LDFLAGS = -L/usr/lib -lm -lhdf5


So, the ldd my_binary gives

linux-vdso.so.1 => (0x00007fff6971b000)
libhdf5.so.6 => /usr/lib/libhdf5.so.6 (0x00007f4c6187b000)
libmpichcxx.so.1.2 => /usr/local/lib/libmpichcxx.so.1.2 (0x00007f4c61657000)
libmpich.so.1.2 => /usr/local/lib/libmpich.so.1.2 (0x00007f4c61296000)
libpmi.so.0 => /usr/local/slurm/lib/libpmi.so.0 (0x00007f4c61091000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f4c60e75000)
librt.so.1 => /lib/librt.so.1 (0x00007f4c60c6c000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f4c60958000)
libm.so.6 => /lib/libm.so.6 (0x00007f4c606d6000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f4c604bf000)
libc.so.6 => /lib/libc.so.6 (0x00007f4c6015d000)
libz.so.1 => /usr/lib/libz.so.1 (0x00007f4c5ff46000)
libmpi.so.0 => /usr/lib/libmpi.so.0 (0x00007f4c5fc95000)
libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0x00007f4c5fa49000)
libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0x00007f4c5f7f4000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f4c5f5ef000)
libnsl.so.1 => /lib/libnsl.so.1 (0x00007f4c5f3d7000)
libutil.so.1 => /lib/libutil.so.1 (0x00007f4c5f1d4000)
libslurm.so.23 => /usr/local/slurm/lib/libslurm.so.23 (0x00007f4c5b52d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4c61e67000)

The strings /usr/local/bin/mpicxx | less gives a quite long text file. I searched for lines with openmpi or
mpich, but I rather paste here the whole parts with mpi or mpich expressions (hope it's OK):

MPILIBNAME="mpich"
PMPILIBNAME="pmpich"
MPICXXLIBNAME="mpichcxx"
MPI_OTHERLIBS=" -lpmi -lpthread -lrt "

# MPICH2_VERSION is the version of the MPICH2 library that mpicxx is intended for
MPICH2_VERSION="1.2.1p1"

# Environment Variables.
# The environment variables MPICH_CXX may be used to override the
# default choices.
# In addition, if there is a file $sysconfdir/mpicxx-$CXXname.conf,
# where CXXname is the name of the compiler with all spaces replaced by hyphens
# (e.g., "CC -64" becomes "CC--64", that file is sources, allowing other
# changes to the compilation environment. See the variables used by the
# script (defined above)
if [ -n "$MPICH_CXX" ] ; then
CXX="$MPICH_CXX"
CXXname=`echo $CXX | sed 's/ /-/g'`
if [ -s $sysconfdir/mpicxx-$CXXname.conf ] ; then
. $sysconfdir/mpicxx-$CXXname.conf
fi
# Allow a profiling option to be selected through an environment variable
if [ -n "$MPICXX_PROFILE" ] ; then
profConf=$MPICXX_PROFILE

Derived variables. These are assembled from variables set from the
# default, environment, configuration file (if any) and command-line
# options (if any)
if [ "$NEEDSPLIB" = yes ] ; then
mpilibs="-l$PMPILIBNAME -l$MPILIBNAME -lopa"
else
mpilibs="-l$MPILIBNAME -lopa"
cxxlibs=
if [ "$MPICXXLIBNAME" != "$MPILIBNAME" ] ; then
cxxlibs="-l$MPICXXLIBNAME"
# Init with the ones needed by MPI
CXXFLAGS="$WRAPPER_CXXFLAGS"
LDFLAGS="$WRAPPER_LDFLAGS"
# Handle the case of a profile switch
if [ -n "$profConf" ] ; then
profConffile=
if [ -s "$libdir/lib$profConf.a" -o -s "$libdir/lib$profConf.so" ] ; then
mpilibs="-l$profConf $mpilibs"
elif [ -s "$sysconfdir/$profConf.conf" ] ; then
profConffile="$sysconfdir/$profConf.conf"
elif [ -s "$profConf.conf" ] ; then
profConffile="$profConf.conf"
else
echo "Profiling configuration file $profConf.conf not found in $sysconfdir"
fi
if [ -n "$profConffile" -a -s "$profConffile" ] ; then
. $profConffile
if [ -n "$PROFILE_INCPATHS" ] ; then
CXXFLAGS="$PROFILE_INCPATHS $CXXFLAGS"
fi
if [ -n "$PROFILE_PRELIB" ] ; then
mpilibs="$PROFILE_PRELIB $mpilibs"
fi
if [ -n "$PROFILE_POSTLIB" ] ; then
mpilibs="$mpilibs $PROFILE_POSTLIB"
fi
fi
# A temporary statement to invoke the compiler
# Place the -L before any args incase there are any mpi libraries in there.
# Eventually, we'll want to move this after any non-MPI implementation
# libraries
if [ "$linking" = yes ] ; then
if [ -n "$CXX_LINKPATH_SHL" ] ; then
# Prepend the path for the shared libraries to the library list
shllibpath="$CXX_LINKPATH_SHL$libdir"
fi
$Show $CXX $MPICH2_MPICXX_FLAGS $CXXFLAGS $LDFLAGS "${allargs[@]}" -I$includedir -L$libdir -L$opalibdir $shllibpath $MPICH2_LDFLAGS $cxxlibs $mpilibs $MPI_OTHERLIBS
rc=$?
else
$Show $CXX $MPICH2_MPICXX_FLAGS $CXXFLAGS "${allargs[@]}" -I$includedir
rc=$?
exit $rc

Reuti 09-09-2012 04:44 PM

Aha, for Open MPI mpicxx is a binary. Therefore I suggested to use the strings command. But as it turned out, it’s MPICH. The mpiexec should be the one from the same library. Otherwise one effect might be, that several processes are started but all are thinking that they run serially. So you have to ask someone from the admin staff, where to find the corresponding mpiexec.

BTW: MPICH2 1.2.1 is old and using an old startup mechanism involving daemons which need to be booted beforehand. In the actual MPICH2 release they don’t use daemons any longer and the mpiexec is a binary too.

jkobori 09-09-2012 04:49 PM

Thank you very much for your help!
Tomorrow I'm going to consult with the admin, then I come back
and write down the results!

Thanks again,
Joe

jkobori 09-10-2012 02:51 PM

Well, I have spoken to the admin, he advised me to use the srun command instead of
the mpirun.openmpi, because I compiled and ran my binary with different mpi versions...
The good think that if I use srun, it works!!

Anyway, the mpi versions I've written above are incorrect, since
there has been a complete distro upgrade a few weeks ago, sorry.

Thank you very much again!

Joe


All times are GMT -5. The time now is 06:36 PM.