openmpi - initialization failed
Hi everybody,
my problem is the following: I'm trying to run an openmpi program on a cluster (atlasz.elte.hu, it's in hungarian, but you can try google translate), but I always got this error: Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(394)...........: Initialization failed MPID_Init(121)..................: MPIDI_Populate_vc_node_ids(1219): MPID_Get_max_node_id(798).......: PMI_KVS_Commit returned -1 I am doing it according to the manual, for example, here is my script: #!/bin/bash mpirun.openmpi /users/jkobori/boxfit/out/boxfit boxfitsettings.txt which I am runnig with the sbatch -p hpc2009 -N 4 -B 2:4:1 s.sh line. This is the way it should be done, but it doesn't work for me. Can You help me out? If any additional info is needed, please, tell me. Regards, Joe |
And how did you compile the application? With the precompiled Open MPI installation? The “MPIR_Init_thread” may come from another MPI version.
|
I compile it with the
CXX = mpicxx option in the makefile. On the machine there are OpenMPI 1.4.2 and MPICH2 1.2.1 installed among many other. I have to compile it with the make clean boxfit line. |
They are all called this waym being it Open MPI or MPICH2. You can use:
Code:
$ which mpicxx |
The which mpicxx gives the following result:
/usr/local/bin/mpicxx |
Well, something's strange happening, because now I tried to run it again,
but after giving the error reported above it the boxfit calculates the results... |
Good, then we have to investigate more:
Code:
$ ls -lh /usr/local/bin/mpicxx |
ls -lh /usr/local/bin/mpicxx results in
-rwxr-xr-x 1 locadmin locals 8.1K Jun 21 2010 /usr/local/bin/mpicxx |
Woah, not a symbolic link as I thought. Maybe this shows more:
Code:
$ strings /usr/local/bin/mpicxx | less Code:
$ ldd your_binary |
Before the ldd, I guess it's better to tell, that in the makefile I have this line:
LDFLAGS = -L/usr/lib -lm -lhdf5 So, the ldd my_binary gives linux-vdso.so.1 => (0x00007fff6971b000) libhdf5.so.6 => /usr/lib/libhdf5.so.6 (0x00007f4c6187b000) libmpichcxx.so.1.2 => /usr/local/lib/libmpichcxx.so.1.2 (0x00007f4c61657000) libmpich.so.1.2 => /usr/local/lib/libmpich.so.1.2 (0x00007f4c61296000) libpmi.so.0 => /usr/local/slurm/lib/libpmi.so.0 (0x00007f4c61091000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f4c60e75000) librt.so.1 => /lib/librt.so.1 (0x00007f4c60c6c000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f4c60958000) libm.so.6 => /lib/libm.so.6 (0x00007f4c606d6000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f4c604bf000) libc.so.6 => /lib/libc.so.6 (0x00007f4c6015d000) libz.so.1 => /usr/lib/libz.so.1 (0x00007f4c5ff46000) libmpi.so.0 => /usr/lib/libmpi.so.0 (0x00007f4c5fc95000) libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0x00007f4c5fa49000) libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0x00007f4c5f7f4000) libdl.so.2 => /lib/libdl.so.2 (0x00007f4c5f5ef000) libnsl.so.1 => /lib/libnsl.so.1 (0x00007f4c5f3d7000) libutil.so.1 => /lib/libutil.so.1 (0x00007f4c5f1d4000) libslurm.so.23 => /usr/local/slurm/lib/libslurm.so.23 (0x00007f4c5b52d000) /lib64/ld-linux-x86-64.so.2 (0x00007f4c61e67000) The strings /usr/local/bin/mpicxx | less gives a quite long text file. I searched for lines with openmpi or mpich, but I rather paste here the whole parts with mpi or mpich expressions (hope it's OK): MPILIBNAME="mpich" PMPILIBNAME="pmpich" MPICXXLIBNAME="mpichcxx" MPI_OTHERLIBS=" -lpmi -lpthread -lrt " # MPICH2_VERSION is the version of the MPICH2 library that mpicxx is intended for MPICH2_VERSION="1.2.1p1" # Environment Variables. # The environment variables MPICH_CXX may be used to override the # default choices. # In addition, if there is a file $sysconfdir/mpicxx-$CXXname.conf, # where CXXname is the name of the compiler with all spaces replaced by hyphens # (e.g., "CC -64" becomes "CC--64", that file is sources, allowing other # changes to the compilation environment. See the variables used by the # script (defined above) if [ -n "$MPICH_CXX" ] ; then CXX="$MPICH_CXX" CXXname=`echo $CXX | sed 's/ /-/g'` if [ -s $sysconfdir/mpicxx-$CXXname.conf ] ; then . $sysconfdir/mpicxx-$CXXname.conf fi # Allow a profiling option to be selected through an environment variable if [ -n "$MPICXX_PROFILE" ] ; then profConf=$MPICXX_PROFILE Derived variables. These are assembled from variables set from the # default, environment, configuration file (if any) and command-line # options (if any) if [ "$NEEDSPLIB" = yes ] ; then mpilibs="-l$PMPILIBNAME -l$MPILIBNAME -lopa" else mpilibs="-l$MPILIBNAME -lopa" cxxlibs= if [ "$MPICXXLIBNAME" != "$MPILIBNAME" ] ; then cxxlibs="-l$MPICXXLIBNAME" # Init with the ones needed by MPI CXXFLAGS="$WRAPPER_CXXFLAGS" LDFLAGS="$WRAPPER_LDFLAGS" # Handle the case of a profile switch if [ -n "$profConf" ] ; then profConffile= if [ -s "$libdir/lib$profConf.a" -o -s "$libdir/lib$profConf.so" ] ; then mpilibs="-l$profConf $mpilibs" elif [ -s "$sysconfdir/$profConf.conf" ] ; then profConffile="$sysconfdir/$profConf.conf" elif [ -s "$profConf.conf" ] ; then profConffile="$profConf.conf" else echo "Profiling configuration file $profConf.conf not found in $sysconfdir" fi if [ -n "$profConffile" -a -s "$profConffile" ] ; then . $profConffile if [ -n "$PROFILE_INCPATHS" ] ; then CXXFLAGS="$PROFILE_INCPATHS $CXXFLAGS" fi if [ -n "$PROFILE_PRELIB" ] ; then mpilibs="$PROFILE_PRELIB $mpilibs" fi if [ -n "$PROFILE_POSTLIB" ] ; then mpilibs="$mpilibs $PROFILE_POSTLIB" fi fi # A temporary statement to invoke the compiler # Place the -L before any args incase there are any mpi libraries in there. # Eventually, we'll want to move this after any non-MPI implementation # libraries if [ "$linking" = yes ] ; then if [ -n "$CXX_LINKPATH_SHL" ] ; then # Prepend the path for the shared libraries to the library list shllibpath="$CXX_LINKPATH_SHL$libdir" fi $Show $CXX $MPICH2_MPICXX_FLAGS $CXXFLAGS $LDFLAGS "${allargs[@]}" -I$includedir -L$libdir -L$opalibdir $shllibpath $MPICH2_LDFLAGS $cxxlibs $mpilibs $MPI_OTHERLIBS rc=$? else $Show $CXX $MPICH2_MPICXX_FLAGS $CXXFLAGS "${allargs[@]}" -I$includedir rc=$? exit $rc |
Aha, for Open MPI mpicxx is a binary. Therefore I suggested to use the strings command. But as it turned out, it’s MPICH. The mpiexec should be the one from the same library. Otherwise one effect might be, that several processes are started but all are thinking that they run serially. So you have to ask someone from the admin staff, where to find the corresponding mpiexec.
BTW: MPICH2 1.2.1 is old and using an old startup mechanism involving daemons which need to be booted beforehand. In the actual MPICH2 release they don’t use daemons any longer and the mpiexec is a binary too. |
Thank you very much for your help!
Tomorrow I'm going to consult with the admin, then I come back and write down the results! Thanks again, Joe |
Well, I have spoken to the admin, he advised me to use the srun command instead of
the mpirun.openmpi, because I compiled and ran my binary with different mpi versions... The good think that if I use srun, it works!! Anyway, the mpi versions I've written above are incorrect, since there has been a complete distro upgrade a few weeks ago, sorry. Thank you very much again! Joe |
All times are GMT -5. The time now is 06:36 PM. |