Weird Nagios server issue? - It seems to just stop running
Hi All,
I've got a weird nagios issue that has just re-occurred for a second time. The first time round was about a week ago now, and all that seemed to happen was everything stayed running ie... a 'ps -ef| grep nagios' showed that the nagios process was still running and the nagios web GUI was still running, and NDO looked to still be running, the backed database looked to be working fine. However in the GUI it show for most services the last check was several hours ago.
Now the services checks are set to be actively monitored so they should be getting checked. The nagios scheduling queue shows lots of checks awaiting in the queue all for around the same time the rest of the status of all other services and hosts was last checked, so it looks like its basically just been paused in a round about way.
Also nagios seems un-responsive when sending commands to it VIA the web GUI of such things as forcing a re-schedule. Restarting the nagios process and stop starting of active checks. It looks as though nagios take no notice of what I tell it to do from the GUI but doesn't show any errors in any logs anywhere what so ever.
I've seen in some areas it could be related to NDO problems, but i'm running a powerful box talking hp BL460 blade system, which should easily cover resource requirements, considering i'm not exactly monitoring as many things that most other people seem to be monitoring when they are talking about NDO issues.
Nagios 3.0.6 so not exactly over the hills old, OS SLES 10.2. Mysql 5.0.26 and NDO was the latest verison available which I think is only a beta from years ago...
Thanks for any help if anyone can.
Cheers,
M
EDIT:-
The only way it seems to make it recover is by stopping the nagios process and then also stopping the NDO process, if I just stop and start the nagios process manually this makes no difference so there must be an issue with NDO that i'm having.
Last edited by lin*x; 08-06-2009 at 05:23 AM.
|