-   Linux - Software (
-   -   ulimit -s 40960 vs ulimit ulimit -s 10240 (

centguy 01-20-2011 06:27 PM

ulimit -s 40960 vs ulimit ulimit -s 10240
I wrote this because i was able to use openmpi to run mpirun
on my 12-core workstation rather happily since day 1 I setup the system a few months ago.

Yesterday when I tried to run a big job under mpirun, the job crashed
rather quickly, the error message was something like

mpirun process exited blah blah with signal 11 (Segmentation fault).

Interestingly (or annoyingly) a job required less memory ran okay.

Since I never had this problem before, I thought it was the hardware
failure. I called my IT guy to explain the problem and he is kind
enough to suggest to put a line

ulimit -s 40960

in my .bashrc.

And it works!

But I have no clue why mpirun misbehaves out of a sudden, and that
ulimit setting solves the problem completely. I would like to learn
from this incident.

Anyone has any idea to share ? Thanks a lot!

centguy 01-20-2011 07:00 PM

okay. My happiness is short-lived.

I still hit

mpirun noticed that process rank 3 with PID 11591 on node xxx-node exited on signal 11 (Segmentation fault).

problem when I tried a big job.

I strongly suspect that this has to do with a huge job that crashed the day before
when the disk space ran out. Could it be that the crashed job is dumping something to the
commonly used space and it was not clear in time for new jobs to used.

Beat me...

All times are GMT -5. The time now is 05:02 AM.