LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 05-20-2011, 03:25 AM   #1
r_r
LQ Newbie
 
Registered: Mar 2011
Posts: 2

Rep: Reputation: 0
Sun grid engine; jobs stuck in "r" state - no errors - runs don't finish


Hi,

I have been facing a weird problem lately when using Sun Grid Engine (SGE).

I have a script which submit jobs to SGE in batches of 30 (total 200 jobs to be submitted). These are then distributed to ~ 10 machines which we have setup in a queue.

Now the problem is that some jobs get stuck on a few machines in "r" state.

In the execution host messages file I have: reaping job "54988" ptf complains: Job does not exist

and then 1 second later in the qmaster file I have:

"job 54988.1 finished on host "

Searched a lot on the net and the closest I could get to my problem is: https://arc.liv.ac.uk/trac/SGE/ticket/495 (extract pasted above)

The problems mentioned here are the same which I saw. In the link mentioned above, restarting SGE is mentioned as the solution which may help. Then I asked IT dept. to restart SGE but again the jobs get stuck in the same way.

There is no error in $SGE_ROOT/../spool/qmaster/messages or $SGE_ROOT/../spool//messages

Neither "qstat -j $job_id" shows any error.

Every now and then a few jobs get stuck to some machines and remain stuck forever. They need to be killed manually and this is getting obviously irritating

I'd appreciate any help on this issue.

Thanks a lot!
 
  


Reply

Tags
engine, grid, jobs, stuck



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: The State of Oracle/Sun Grid Engine LXer Syndicated Linux News 0 09-01-2010 12:50 PM
[SOLVED] Sun Grid Engine on linux me_spearhead Linux - General 5 03-12-2010 07:11 AM
Does anybody knows SUN GRID ENGINE?? meng_en Linux - General 4 02-24-2010 11:03 PM
How does Sun Grid Engine 6.2u3 kick off jobs? gumbyjnm Linux - General 5 10-26-2009 10:09 PM
Installation Of Sun Grid Engine parani86 Linux - Software 7 08-06-2008 07:56 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 01:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration