Hi KBriggs,
I was waiting to see if someone would come forward with an answer for you because I was curious too. I don't have an answer but just some ideas, based on what I found googling around (which I am sure you did already). On systems I have used the error logs would report which node crashed and the error it gave. It's supposed to do this: from the
openmpi documentation for mpirun
Quote:
Process Termination / Signal Handling
During the run of an MPI application, if any rank dies abnormally
(either exiting before invoking MPI_FINALIZE, or dying as the result of
a signal), mpirun will print out an error message and kill the rest of
the MPI application.
nal, it is probably not necessary (and safest) for the user to only
clean up non-MPI state.
|
Which MPI implementation are you using?
If it is not telling you anything then you could try the verbose (--verbose) option to mpirun. If you get only the MPI rank but not the hostname, maybe you could add a statement earlier that prints to stdout on every node, the machine's hostname and mpi rank?
Sorry I can't be more helpful, but I will be interested to know if you find a good solution.
Cheers
Scott