LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-10-2017, 02:20 AM   #1
boconganhdo
LQ Newbie
 
Registered: Jan 2017
Posts: 6

Rep: Reputation: Disabled
Information about error message under mpirun


Hello all member!

I wrote this because i was able to use openmpi to run mpirun
on my 12-core workstation rather happily since day 1 I setup the system a few months ago.

Yesterday when I tried to run a big job under mpirun, the job crashed
rather quickly, the error message was something like

mpirun process exited blah blah with signal 11 (Segmentation fault).

Interestingly (or annoyingly) a job required less memory ran okay.

Since I never had this problem before, I thought it was the hardware
failure forum internet. I called my IT guy to explain the problem and he is kind
enough to suggest to put a line

ulimit -s 40960

in my .bashrc.

And it works!

But I have no clue why mpirun misbehaves out of a sudden, and that
ulimit setting solves the problem completely. I would like to learn
from this incident.

Anyone has any idea to share ?
Thanks a lot!

Last edited by boconganhdo; 02-23-2017 at 11:59 PM.
 
Old 02-12-2017, 04:03 AM   #2
jonnybinthemix
Member
 
Registered: May 2014
Location: Bristol, United Kingdom
Distribution: RHEL 5 & 6
Posts: 169

Rep: Reputation: Disabled
Hey!

So what you found is that the default stack size was too small for the job you were running with MPI.

I'm assuming that the job that failed was a different job to the ones which were working? Likely more complex code was running and the stack size wasn't large enough on the compute node to be able to process. Assuming you are just running mpi on one computer across 12 cores, the fix your IT guy suggested will work by increasing the maximum amount of memory allocated to the stack is increased.

You can write your code to allow for limited stack sizes, or you can increase it on the system.

I would do what you did, however I always recommend increasing it to 'unlimited' with:

Code:
ulimit -s unlimited
A better fix for this, and safer for the compute node would be to use a scheduler, like SLURM.. and issue your jobs from a different node (login node usually) - Then you would set the stack size limit to unlimited on the login node and 8192 on the compute node. The job will then be submitted to the compute node and inherit the stack size of the login node. Thus not needed to set the compute node to unlimited.

Hope that helps,
Cheers,
Jon
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Server Hung with error message: ERROR: Message hist queue is filling up iprince Linux - Enterprise 7 02-10-2014 09:40 AM
mpirun error sryzdn Linux - Newbie 1 04-30-2013 12:12 AM
hapi - is it possible to get information from abstract structures like Message? eantoranz Programming 3 05-24-2012 09:59 AM
MPIRUN error: location of orted TronCarter Solaris / OpenSolaris 1 02-05-2010 01:36 PM
Strange Repeating Error message in /var/log/message lucktsm Linux - Security 2 10-27-2006 08:29 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration