LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-09-2017, 03:03 AM   #1
rafperez
LQ Newbie
 
Registered: Nov 2017
Posts: 4

Rep: Reputation: Disabled
Problem with SGE in CentOS


Hello everyone,

I have a computer cluster which has a CentOS operating system and uses SGE as a queue manager. The problem is that they have recently appeared when making a qstat -f an error of au ((a) larm, (u) nreachable), as shown below. So I understood that the problem is connection.

queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
all.q@compute-0-0.local BIP 0/0/20 -NA- linux-x64 au
---------------------------------------------------------------------------------
all.q@compute-0-1.local BIP 0/0/20 -NA- linux-x64 au
---------------------------------------------------------------------------------
all.q@compute-0-2.local BIP 0/0/20 -NA- linux-x64 au


I have tried to restart the cluster and update the operating system, but it has not corrected the problem. I have also tried to restart the sge through the command ./sgemaster start (which tells me that it is on) and ./sgeexecd start (which tells me that it is starting it). Despite all this, the error persists . Can you think of how I could solve my problem?

Thank you very much to all.

Regards

Rafael
 
Old 11-10-2017, 06:48 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,160

Rep: Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266Reputation: 1266
What monitoring do you have for network, disk or memory problems? A cluster like that should have Nagios or Munin or something.
 
Old 11-10-2017, 07:46 PM   #3
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,836

Rep: Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221
Does the sge_execd run on the compute nodes?
Code:
pgrep -l sge
 
Old 11-13-2017, 03:15 AM   #4
rafperez
LQ Newbie
 
Registered: Nov 2017
Posts: 4

Original Poster
Rep: Reputation: Disabled
Hi everybody,

I execute "pgrep -l sge" and exit is "6074 sge_qmaster 6271 sge_execd" I think that sge_execd are running on the compute nodes. Respect to the problems, when I execute qstat -j command I obtain:

scheduling info: queue instance "all.q@node_name_0" dropped because it is temporarily not available
queue instance "all.q@node_name_1" dropped because it is temporarily not available
queue instance "all.q@node_name_2" dropped because it is temporarily not available
All queues dropped because of overload or full


Thanks so much.

Regards
 
Old 11-15-2017, 01:57 AM   #5
rafperez
LQ Newbie
 
Registered: Nov 2017
Posts: 4

Original Poster
Rep: Reputation: Disabled
Hello everyone,

I finally found the fault. It was that the sge_exced were not running on the calculation nodes, so you only had to access node to node and restart it with the command ./sge_exced.
In this way the "au" fault is removed from the queue manager.


Thank you very much to all.
Regards.
RP
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
OGE/SGE Clustering sjreilly Linux - Server 4 06-29-2012 05:03 AM
SGE Host Specification Fozzeebear Linux - General 0 01-28-2011 03:15 PM
SGE or an alternative robertkraus Linux - Server 2 02-21-2010 03:26 AM
RenderMan on Rocks 5.1 using SGE 6.2 jdweekley Linux - Enterprise 1 02-03-2010 10:28 AM
SGE (Sun Grid Engine) advice please johnsfine Linux - Networking 0 05-06-2009 02:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 02:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration