LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Nagios Notifications - Only Flapping Alerts Being Sent Out! (https://www.linuxquestions.org/questions/linux-software-2/nagios-notifications-only-flapping-alerts-being-sent-out-763197/)

kevin82287 10-20-2009 09:48 AM

Nagios Notifications - Only Flapping Alerts Being Sent Out!
 
Hello all,

First off, I really appreciate any feedback you can provide. I’ve just recently started working with Nagios and everything is working great, except for notifications being sent out. I’ve been searching all over the net, comparing my configurations with others to see if there was anything noticeable I was missing, but I can’t seem to find anything. I’ve included my configuration files below for you to look at.

I have been able to send a mail from the server (CentOS 5.3) using sendmail. I can also successfully send Email using the E-mail commands in the commands.cfg file.
Code:

/usr/bin/mail "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ <MY EMAIL ADDRESS>
Everything is being monitored correctly, just for some reason the E-mails are not being sent out. I should also note that if I do a “Send custom host notification” in nagios on a host that has a critical warning, I DO get an E-mail successfully. It’s just not automatically sending them out.

UPDATE: I do have some good news. I did receive a notification last night for a flapping alert. However, it still is not sending out alerts from being down, returning to up state, etc...
Quote:

[1255678430] SERVICE FLAPPING ALERT: AUSTIN-LAPTOP;CPU Load;STOPPED; Service appears to have stopped flapping (4.0% change < 5.0% threshold)
[1255678430] SERVICE NOTIFICATION: libertyadmins;AUSTIN-LAPTOP;CPU Load;FLAPPINGSTOP (CRITICAL);notify-service-by-email;No route to host
So it appears I'm receiving flapping alerts, but nothing else. In the Nagios web interface, if I go to Configuration --> Contacts....I can see that under Service Notification Options and Host Configuration Options I only see "Flapping, Downtime" enabled. I'm not getting downtime alerts, but shouldn't there be more options there? Uptime....etc.



Is there something obvious I am missing in the config files? I really appreciate the help, please let me know what you think.

Templates.cfg
Code:

###############################################################################
###############################################################################
#
# CONTACT TEMPLATES
#
###############################################################################
###############################################################################

define contact{
      name                            generic-contact        ; The name of this contact template
      host_notifications_enabled      1
      service_notifications_enabled  1
      host_notification_commands      notify-host-by-email
      service_notification_commands  notify-service-by-email
      service_notification_period    24x7                    ; service notifications can be sent anytime
      host_notification_period        24x7                    ; host notifications can be sent anytime
      service_notification_options    w,u,c,r,f,s            ; send notifications for all service states, flapping events, and scheduled downtime events
      host_notification_options      w,u,c,r,f,s            ; send notifications for all host states, flapping events, and scheduled downtime events
      register                        0                      ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }


###############################################################################
###############################################################################
#
# HOST TEMPLATES
#
###############################################################################
###############################################################################

# Generic host definition template - This is NOT a real host, just a template!

define host{
      name                              generic-host    ; The name of this host template
      notifications_enabled              1                    ; Host notifications are enabled
      event_handler_enabled              1                    ; Host event handler is enabled
      flap_detection_enabled            1                    ; Flap detection is enabled
      failure_prediction_enabled        1                    ; Failure prediction is enabled
      process_perf_data                  1                    ; Process performance data
      retain_status_information          1                    ; Retain status information across program restarts
      retain_nonstatus_information      1                    ; Retain non-status information across program restarts   
      notification_period                24x7                ; Send host notifications at any time
      check_period                      24x7                ; By default, Linux hosts are checked round the clock
      check_interval                    1                    ; Actively check the host every 1 minutes
      retry_interval                    1                    ; Schedule host check retries at 1 minute intervals
      max_check_attempts                10                  ; Check each Linux host 10 times (max)
      check_command                      check-host-alive    ; Default command to check Linux hosts
      notification_period                workhours            ; Linux admins hate to be woken up, so we only notify during the day
      notification_interval              120                  ; Resend notifications every 2 hours
      notification_options              w,u,c,r,f,s          ; Only send notifications for specific host states
      contact_groups                    libertyadminsgroup  ; Notifications get sent to the admins by default
      register                          0                    ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }



# Windows host definition template - This is NOT a real host, just a template!

define host{
      name                        windows-server      ; The name of this host template
      use                        generic-host        ; Inherit default values from the generic-host template
      check_period                24x7                ; By default, Windows servers are monitored round the clock
      check_interval              1                    ; Actively check the server every 1 minutes
      retry_interval              1                    ; Schedule host check retries at 1 minute intervals
      max_check_attempts          10                  ; Check each server 10 times (max)
      check_command              check-host-alive    ; Default command to check if servers are "alive"
      notification_period        24x7                ; Send notification out at any time - day or night
      notification_interval      60                  ; Resend notifications every 30 minutes
      notification_options        w,u,c,r,f,s          ; Only send notifications for specific host states
      contact_groups              libertyadminsgroup  ; Notifications get sent to the admins by default
      hostgroups                  windows-servers      ; Host groups that Windows servers should be a member of
      register                    0                    ; DONT REGISTER THIS - ITS JUST A TEMPLATE
      }


# Define a template for switches that we can reuse
define host{
      name                        switches                    ; The name of this host template
      use                        generic-host                ; Inherit default values from the generic-host template
      check_period                24x7                        ; By default, switches are monitored round the clock
      check_interval              5                          ; Switches are checked every 5 minutes
      retry_interval              1                          ; Schedule host check retries at 1 minute intervals
      max_check_attempts          10                          ; Check each switch 10 times (max)
      check_command              check-host-alive            ; Default command to check if routers are "alive"
      notification_period        24x7                        ; Send notifications at any time
      notification_interval      30                          ; Resend notifications every 30 minutes
      notification_options        w,u,c,r,f,s                ; Only send notifications for specific host states
      contact_groups              libertyadminsgroup          ; Notifications get sent to the admins by default
      hostgroups                  switches
      register                    0                          ; DONT REGISTER THIS - ITS JUST A TEMPLATE
      }

define host{
      name                        routers                    ; The name of this host template
      use                        generic-host                ; Inherit default values from the generic-host template
      check_period                24x7                        ; By default, switches are monitored round the clock
      check_interval              5                          ; Switches are checked every 5 minutes
      retry_interval              1                          ; Schedule host check retries at 1 minute intervals
      max_check_attempts          10                          ; Check each switch 10 times (max)
      check_command              check-host-alive            ; Default command to check if routers are "alive"
      notification_period        24x7                        ; Send notifications at any time
      notification_interval      30                          ; Resend notifications every 30 minutes
      notification_options        w,u,c,r,f,s                ; Only send notifications for specific host states
      contact_groups              libertyadminsgroup          ; Notifications get sent to the admins by default
      hostgroups                  routers
      register                    0                          ; DONT REGISTER THIS - ITS JUST A TEMPLATE
      }



###############################################################################
###############################################################################
#
# SERVICE TEMPLATES
#
###############################################################################
###############################################################################

# Generic service definition template - This is NOT a real service, just a template!

define service{
        name                            generic-service        ; The 'name' of this service template
        active_checks_enabled          1                      ; Active service checks are enabled
        passive_checks_enabled          1                      ; Passive service checks are enabled/accepted
        parallelize_check              1                      ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service            1                      ; We should obsess over this service (if necessary)
        check_freshness                0                      ; Default is to NOT check service 'freshness'
        notifications_enabled          1                      ; Service notifications are enabled
        event_handler_enabled          1                      ; Service event handler is enabled
        flap_detection_enabled          1                      ; Flap detection is enabled
        failure_prediction_enabled      1                      ; Failure prediction is enabled
        process_perf_data              1                      ; Process performance data
        retain_status_information      1                      ; Retain status information across program restarts
        retain_nonstatus_information    1                      ; Retain non-status information across program restarts
        is_volatile                    0                      ; The service is not volatile
        check_period                    24x7                  ; The service can be checked at any time of the day
        max_check_attempts              3                      ; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval          3                      ; Check the service every 10 minutes under normal conditions
        retry_check_interval            1                      ; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  libertyadminsgroup    ; Notifications get sent out to everyone in the 'admins' group
        notification_options            w,u,c,r,f,s            ; Send notifications about warning, unknown, critical, and recovery events
        notification_interval          60                    ; Re-notify about service problems every hour
        notification_period            24x7                  ; Notifications can be sent out at any time
        register                        0                      ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }


Commands.cfg


Code:

# 'notify-host-by-email' command definition
define command{
      command_name  notify-host-by-email
      command_line  /usr/bin/mail "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
      }

# 'notify-service-by-email' command definition
define command{
      command_name  notify-service-by-email
      command_line  /usr/bin/mail "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
      }



Services.cfg


Code:

define service{
      use                  generic-service
      hostgroup_name      windows-servers
      service_description System Uptime
      check_command        check_nt!UPTIME
      }

define service{
      use                  generic-service
      hostgroup_name      windows-servers
      service_description CPU Load     
      check_command        check_nt!CPULOAD!-l 5,80,90
      }

define service{
      use                  generic-service
      hostgroup_name      windows-servers
      service_description Memory Usage
      check_command        check_nt!MEMUSE!-w 80 -c 90
      }

define service{
      use                  generic-service
      hostgroup_name      windows-servers
      service_description  Used Disk Space     
      check_command        check_nt!USEDDISKSPACE!-l c -w 80 -c 90
      }

define service{
      use                  generic-service
      hostgroup_name      windows-servers
      service_description Ping Test
      check_period                24x7
      max_check_attempts          3
      normal_check_interval      3
      retry_check_interval        1
      contact_groups              libertyadminsgroup
      notification_interval      60
      notification_period        24x7
      notification_options        w,u,c,r
      check_command              check_ping!200.0,20%!600.0,60%    ; The command used to monitor the service
      }


Hosts.cfg



Code:

define host{
      use          windows-server      ; Inherit default values from a template
      host_name    NAME                ; The name we're giving to this host
      alias        NAME                ; A longer name associated with the host
      address      <IP ADDRESS>        ; IP address of the host
      }


kevin82287 10-20-2009 01:05 PM

Problem has been fixed. The objects.cache file was using my old configurations, so I simply disabled pre-caching in the main nagios.cfg file.


All times are GMT -5. The time now is 05:20 PM.