LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 11-16-2010, 04:40 AM   #1
zugvogel
LQ Newbie
 
Registered: Sep 2005
Location: Tokyo, Japan
Distribution: Mac, Ubuntu, Debian and Centos
Posts: 28

Rep: Reputation: 16
Parallel mpich2-based software does not run


Hi,
I'm trying to run some software parallelised through mpich2.

I have the daemon running (mpd).
I then launch the software: nohup mpiexec -n 8 /path/to/software/softwarename.ex > out 2> err &

and when I check with "ps aux" I can see I have 8 copies of the software running and 8 listings of mpd, however according to "top", these processes are running at 0%, and indeed there is no output - it just hangs.

This happens on some computers, but not on others, so I can rule out a problem with the software itself.

Does anyone know what might cause this?

Thanks in advance.
 
Old 11-16-2010, 06:04 AM   #2
zugvogel
LQ Newbie
 
Registered: Sep 2005
Location: Tokyo, Japan
Distribution: Mac, Ubuntu, Debian and Centos
Posts: 28

Original Poster
Rep: Reputation: 16
Update

Update:
It seems if I specify "-n 1" I can get it running on one processor, but any more and nothing happens. So maybe it's something to do with mpich2?
 
Old 11-16-2010, 07:33 PM   #3
zugvogel
LQ Newbie
 
Registered: Sep 2005
Location: Tokyo, Japan
Distribution: Mac, Ubuntu, Debian and Centos
Posts: 28

Original Poster
Rep: Reputation: 16
Another update

By doing "strace" it produces many lines of the following output:

select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)

etc

Does anyone know what's wrong? Until I fix this I can barely do any calculations, so I'm quite desperate to get it working. I've tried restarting the computer, and also re-installing Mpich2 using shm instead of nemesis (since it's simply running on a multi-core computer) but without any change in the situation.

Thank you.
 
Old 11-17-2010, 05:14 PM   #4
chalbersma
LQ Newbie
 
Registered: Jan 2010
Posts: 2

Rep: Reputation: 0
ssh?

Mpich uses ssh to communicate with each node right if I remember correctly. Maybe on some of your computers you don't have ssh enabled.
 
Old 11-17-2010, 07:11 PM   #5
zugvogel
LQ Newbie
 
Registered: Sep 2005
Location: Tokyo, Japan
Distribution: Mac, Ubuntu, Debian and Centos
Posts: 28

Original Poster
Rep: Reputation: 16
Thanks a lot for your suggestion, but in this case, ssh is working for the computer.

Additionally, since I'm running on a multi-core processor, all spawns of the software occur on one processor, on the same computer, and ssh is not involved in this case.
 
Old 11-19-2010, 08:54 PM   #6
zugvogel
LQ Newbie
 
Registered: Sep 2005
Location: Tokyo, Japan
Distribution: Mac, Ubuntu, Debian and Centos
Posts: 28

Original Poster
Rep: Reputation: 16
A kind-of solution

For future reference: I did a work-around by deleting mpich2 and installing openmpi instead... it requires re-compilation of any software that was originally compiled for mpich2, but once that's done it works fine, and apparently without the need for an mpd like mpich2 needed.
 
  


Reply

Tags
mpd, mpi, parallel


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to run same script in parallel from on session? e.g. 5x bpeasey Linux - Newbie 6 03-18-2010 11:18 AM
want to Run autogrid4/autodock4 parallel on HPC hepfpklk Linux - Newbie 0 02-03-2010 07:46 AM
how to run a Program parallel hepfpklk ROCK 1 01-20-2010 05:59 AM
Scripting: audio playback script - how to run two tasks in parallel? klss Linux - Software 5 01-16-2009 03:17 AM
LXer: Run Parallel Commands in a Cluster Using SSH on UNIX LXer Syndicated Linux News 0 09-12-2006 02:54 PM


All times are GMT -5. The time now is 08:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration