LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   Looking for watchdog to take care of nagios process (http://www.linuxquestions.org/questions/linux-software-2/looking-for-watchdog-to-take-care-of-nagios-process-834893/)

angel115 09-28-2010 01:55 AM

Looking for watchdog to take care of nagios process
 
Hello There,

I'm looking for a watchdog program that would take care of nagios process and restart it if it hangs or quit unexpectedly.

I was thinking using CRONTAB to do so.

Is it a good idea, or is there a better solution?
Does anyone already done this?

Best regards,
Angel.

EricTRA 09-28-2010 02:02 AM

Hi,

This one was posted on the Nagios Users list a while ago but still does the trick.
Code:

#!/bin/bash
plugindir="/usr/local/nagios/libexec"
cmdstart='/sbin/service nagios start'
#Check nagios with check_nagios plugin
$plugindir/check_nagios -e 5 -F /usr/local/nagios/var/nagios.log
-C /usr/local/nagios/bin/nagios

if [ "${?}" != 0 ] ; then
 echo "CRITICAL:Nagios not found running..."
$cmdstart
 else
echo "OK:Nagios runnning..."
fi
exit

Change where needed and put it in a crontab with a user who's permitted to start the Nagios process.

Kind regards,

Eric

quanta 09-28-2010 02:15 AM

Quote:

Originally Posted by angel115 (Post 4111066)
I'm looking for a watchdog program that would take care of nagios process and restart it if it hangs or quit unexpectedly.

monit/mmonit is a famous program in this field. Another way, you can use check_nagios plugin.

angel115 09-28-2010 02:21 AM

Thanks alot to you both.

I think I'll use check_nagios for now, but I'll keep monit in my pocket for later ;)

Angel

prayag_pjs 09-28-2010 02:23 AM

Monit is good choice

EricTRA 09-28-2010 02:28 AM

Hi,

You're welcome. If you've used them both for a while it would be appreciated if you could post your experiences here at LQ.

Kind regards,

Eric

angel115 09-28-2010 04:43 AM

Hi EricTRA,

After some test it's working find using your script:
1. I create a new file with your script inside in /usr/local/nagios/bin/nagios_watchdog
2. Change the right and owner to make it executable to 750 and nagios:nagios respectively.
3. I add a new line in my /etc/crontab file (This will run the script every 2 minutes)
Code:

*/2 *  * * *  nagios  /usr/local/nagios/bin/nagios_watchdog
3. I reload my crontab
Code:

# reload cron
TESTING:
For testing I did the following:
Code:

# killall nagios
Then I try to access my nagios web page ==>> result, no access
new attempt after 2 minutes ==>> result, Nagios is back on track ;)

EricTRA 09-28-2010 04:50 AM

Hello,

That's good news. Have fun with Linux.

Kind regards,

Eric

quanta 10-01-2010 03:09 AM

Quote:

Originally Posted by EricTRA (Post 4111073)
Code:

#!/bin/bash
plugindir="/usr/local/nagios/libexec"
cmdstart='/sbin/service nagios start'
#Check nagios with check_nagios plugin
$plugindir/check_nagios -e 5 -F /usr/local/nagios/var/nagios.log
-C /usr/local/nagios/bin/nagios

if [ "${?}" != 0 ] ; then
 echo "CRITICAL:Nagios not found running..."
$cmdstart
 else
echo "OK:Nagios runnning..."
fi
exit


More exactly, I suggest you check with:
Code:

./check_nagios -e 5 -F /usr/local/nagios/var/nagios.log -C /usr/local/nagios/bin/nagios | awk '{ print $2 }'
Because when the system time is wrong, although the nagios is still running, check_nagios plugin return the following warning and exit status=1:
Code:

# ./check_nagios -e 5 -F /usr/local/nagios/var/nagios.log -C /usr/local/nagios/bin/nagios
NAGIOS WARNING: 1 process, status log updated 1221 seconds ago
# echo $?
1


EricTRA 10-01-2010 03:42 AM

Quote:

Originally Posted by quanta (Post 4114506)
More exactly, I suggest you check with:
Code:

./check_nagios -e 5 -F /usr/local/nagios/var/nagios.log -C /usr/local/nagios/bin/nagios | awk '{ print $2 }'
Because when the system time is wrong, although the nagios is still running, check_nagios plugin return the following warning and exit status=1:
Code:

# ./check_nagios -e 5 -F /usr/local/nagios/var/nagios.log -C /usr/local/nagios/bin/nagios
NAGIOS WARNING: 1 process, status log updated 1221 seconds ago
# echo $?
1


Hi,

That's why NTP has been invented, to avoid wrong system times :) Since correct time is crucial when monitoring I always set up my servers with NTP to synchronize time. Of course if there's no possibility to sync time, or the admin is not aware of a 'wrong' system time then your solution is more adequate.

Kind regards,

Eric


All times are GMT -5. The time now is 02:58 PM.