Help with Nagios and NRPE Event Handlers
Hi
I apologize if this isnt the right place to post this but I am not certain it should be under security.
That said, I would really like some assistance with setting up event handlers for Nagios, and I am unable to complete registration at their site. This seems to be the first best option!
I have a few services that I am monitoring on different systems, and some of them just need to be restarted (for instance, strongswan VPNs) and my real problem of the day is a freeIPA/redhat IdM system whose dirvsrv instance just times out for no reason, and everything stops working til I kill ns-slapd and run "ipactl restart".
I wrote a small bash script that does just exactly that (killall -9 ns-slapd and ipactl stop && ipactl start), and I since I have the IPA server being monitored by nagios ldap_check and dns_check, I wanted to make sure that the script will run when there is a time out. I use NRPE on the remote (the IPA server) to monitor.
The IPA server and the Nagios server both run 64-bit RHEL6.5 with 2GB RAM. They are guests in different KVM hypervisors.
To this end I dropped the executable bash script into the /usr/lib64/nagios/plugins/eventhandlers/ directory on the IPA server, and created a line for the command in the /etc/nagios/nrpe.cfg file like so:
command[ipactl_restart]=/usr/lib64/nagios/plugins/eventhandlers/ipactl_restart.sh
I now went to the Nagios server and created a command for this, and also copied the bash script into its own event handler directory like so:
##ipactl restart command
define command{
command_name ipactl_restart
command_line $USER1$/eventhandlers/ipactl_restart.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$
}
I now created added an eventhandler line to the service definition in the appropriate file in Nagios:
define service {
use generic-service
hostgroup_name IPAServers
service_description IPA Directory Service
check_command check_nrpe!check_ipa389
max_check_attempts 3
check_interval 5
retry_interval 1
check_period 24x7
notification_interval 60
notification_period 24x7
contacts admins
event_handler check_nrpe!ipactl_restart
}
I restarted Nagios but nothing happens, it only logs outage times, but no event handling!
[1397636010] SERVICE NOTIFICATION: nagiosadmin;services.example.com;IPA Directory Service;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 20 seconds.
Please can someone guide me through what I am doing wrong? I also tried a command_line with NRPE in it but that didnt work either:
#command_line $USER1$/check_nrpe -n -H $HOSTADDRESS$ -c /usr/lib64/nagios/plugins/eventhandlers/ipactl_restart.sh -a $ARG1$
Any assistance welcome, and thanks in advance!
|