LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Parallel mpich2-based software does not run (https://www.linuxquestions.org/questions/linux-software-2/parallel-mpich2-based-software-does-not-run-844553/)

zugvogel 11-16-2010 03:40 AM

Parallel mpich2-based software does not run
 
Hi,
I'm trying to run some software parallelised through mpich2.

I have the daemon running (mpd).
I then launch the software: nohup mpiexec -n 8 /path/to/software/softwarename.ex > out 2> err &

and when I check with "ps aux" I can see I have 8 copies of the software running and 8 listings of mpd, however according to "top", these processes are running at 0%, and indeed there is no output - it just hangs.

This happens on some computers, but not on others, so I can rule out a problem with the software itself.

Does anyone know what might cause this?

Thanks in advance.

zugvogel 11-16-2010 05:04 AM

Update
 
Update:
It seems if I specify "-n 1" I can get it running on one processor, but any more and nothing happens. So maybe it's something to do with mpich2?

zugvogel 11-16-2010 06:33 PM

Another update
 
By doing "strace" it produces many lines of the following output:

select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)

etc

Does anyone know what's wrong? Until I fix this I can barely do any calculations, so I'm quite desperate to get it working. I've tried restarting the computer, and also re-installing Mpich2 using shm instead of nemesis (since it's simply running on a multi-core computer) but without any change in the situation.

Thank you.

chalbersma 11-17-2010 04:14 PM

ssh?
 
Mpich uses ssh to communicate with each node right if I remember correctly. Maybe on some of your computers you don't have ssh enabled.

zugvogel 11-17-2010 06:11 PM

Thanks a lot for your suggestion, but in this case, ssh is working for the computer.

Additionally, since I'm running on a multi-core processor, all spawns of the software occur on one processor, on the same computer, and ssh is not involved in this case.

zugvogel 11-19-2010 07:54 PM

A kind-of solution
 
For future reference: I did a work-around by deleting mpich2 and installing openmpi instead... it requires re-compilation of any software that was originally compiled for mpich2, but once that's done it works fine, and apparently without the need for an mpd like mpich2 needed.


All times are GMT -5. The time now is 09:19 AM.