LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   MPI programs freeze when accessing MPI shared file on Scientific Linux 6 (https://www.linuxquestions.org/questions/linux-software-2/mpi-programs-freeze-when-accessing-mpi-shared-file-on-scientific-linux-6-a-4175424578/)

Pizzicato 08-29-2012 07:41 AM

MPI programs freeze when accessing MPI shared file on Scientific Linux 6
 
Hi everybody!
I've been having this problem for a while now and I haven't been able to solve it:
Whenever I access a MPI shared file my program freezes and it doesn't give any output or errors. I made this very very simple program in C to test it:

Code:

#include "mpi.h"
#include <stdio.h>

int main(int argc, char **argv)
{
    MPI_File fh;

    MPI_Init(&argc,&argv);

    MPI_File_open(MPI_COMM_WORLD, "datafile",
          MPI_MODE_CREATE | MPI_MODE_RDWR,
                  MPI_INFO_NULL, &fh);

    MPI_File_close(&fh);

    MPI_Finalize();
    return 0;
}


As I said it freezes and I have to kill it myself with qdel command. It actually creates the file "datafile", and there's no output in the error or output files besides the ones related to being manually killed.

I send this program to torque with this PBS script:
Code:

#! /bin/bash
#PBS -S /bin/bash
#PBS -A batch
#PBS -N test_mpi_file
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:02:50
#PBS -j oe

cd $PBS_O_WORKDIR

mpiexec.hydra -rmk pbs /home/pablo/Programs/mbg/c/test_mpi_file

I have the next SW configuration:
- mpich2 1.2.1 using Hydra
- Torque 2.5.7
- Maui 3.2.6

Maybe it has something to do with the NFS home directory that is shared with all the nodes, because I can execute the program with no problem when I do it in just one machine, being the head node or any other. It only fails when two or more machines are accessing the file.

Any help would be very appreciated! :)

Thanks


All times are GMT -5. The time now is 04:31 PM.