Advanced Notifications with Nagios

matisq · 12-16-2008, 02:50 AM

Now I've got 10 serviced in error state, so I received 10 email about problem every hour. I now the problems. I don't need notifications every hour, or two. I need custom solution for that
Could it be possible to implement such workflow ?

From first error till 24h after that = one email per hour
After 24h till one week = one email every 6 hours
After that until solution = one email per day

Thx in advanced!

JimBass · 12-16-2008, 08:56 AM

You should have searched, this gets asked/answered all of the time.

What you want to do is define host escalations. This is the template -
http://nagios.sourceforge.net/docs/2...hostescalation

Set it up with wildcards so it applies to all hosts. Count the number of emails you'll receive in 24 hours (looks like 24), then have the first escalation kick in. Define it to send one email every 6 hours, and count the total number (24 + (4perday)(6days)) that will be received over the course of the week. Then define a 2nd escalation that comes into effect after that amount of emails (looks like 48, but might be +/-1). Define that to go once every 24 hours forever.

All of this is found on your nagios site as well in the documentation. You could have found it if you did a bit of reading.

Peace,
JimBass

Berhanie · 12-16-2008, 03:02 PM

Welcome to LQ, by the way.

matisq · 12-18-2008, 07:52 AM

THX for response!

But I've got many hosts. If I use hostescalation I get info about host problem (reachable or not) or service problems also?

Totally I have 30 services on 10 hosts but 10 of them are in error state. Should I use service escalation? Should I define escalation for each service? If I define escalation, what happened with default e-mail notifications?

matisq · 12-18-2008, 08:22 AM

Or maybe a little example?

JimBass · 12-18-2008, 09:28 AM

This page does a good job of talking through service escalations.

http://www.crucialwebhost.com/blog/n...ing-with-nrpe/

The use of the wildcard (*) will work for all services, or hosts. Some people report that instead of a wildcard, they have to use .* to have it work, but one of those 2 will send escalations for every service. You could also form a a list if you want, like have one escalation for smtp,pop3,imap and a different one for http. You can also define host escalations.

Again, check your documentation. Nagios is not new technology, there are gigabytes of data all across the internet about its use and configuration. All I did to find the example above was to put "nagios email escalation" into google (without the quotes).

By the way, having 10 hosts/services in an error state state probably indicates you either have bad settings for your nagios install or your network is in very bad shape!

Peace,
JimBass

matisq · 12-19-2008, 01:52 AM

THX for response!

I've been looking for Details about service escalation, but I don't have really a lot of time (other task to do).
Generally I don't use default Nagios plugins: I develop 5 or 6 custom plugins. 10 plugins in error state means only that host has problem with low disk space or some program just stop working. I tried to create Service Escalation Definition:

Send one e-mail every 6 hours after 24 e-mails already sent.
Last notification is after one week.

Code:

define serviceescalation{
        hostgroup_name                remote-servers
        service_description           *
        first_notification            24
        last_notification             168
        notification_interval         360
        contact_groups                admins
        }

Send one e-mail every day after 48 notifications
Last notification is one year.

Code:

define serviceescalation{
        hostgroup_name                remote-servers
        service_description           *
        first_notification            48
        last_notification             525600
        notification_interval         1440
        contact_groups                admins
        }

Is that correct? Should I create first definition (first 24 hours - 24 e-mails) or it will be inherit from generral Nagios Definition?

Code:

define serviceescalation{
        hostgroup_name                remote-servers
        service_description           *
        first_notification            1
        last_notification             23
        notification_interval         60
        contact_groups                admins
        }

JimBass · 12-20-2008, 10:00 PM

You're off in your numbers -

Quote:

Send one e-mail every 6 hours after 24 e-mails already sent.
Last notification is after one week.
Code:

define serviceescalation{
hostgroup_name remote-servers
service_description *
first_notification 24
last_notification 168
notification_interval 360
contact_groups admins
}

The hostgroup is fine, as long as the only hostgroup you want this done for is remote-servers. Service_description is also fine, but some people reported that they need .* instead of * to affect all hosts. You also have first_notification fine, but your last_notification is way off. You counted as if you were still getting 24 emails a day. Once this escalation kicks in, you only get 4 emails a day, one every six hours, which you setup correctly with a notification_interval of 360. With one email every 6 hours, you only get 4 emails a day. You want this to start at the beginning of the 2nd day, and go until the end of the 7th, that is 6 days of 4 emails a day. 6x4 is 24. The 24 in this escalation plus the 24 in the first day is 48 emails. So your last notification isn't 168, it is 48.

Interestingly, you start the final escalation correctly, which required the 48 you somehow changed to 168 above, but you're miles off the last_notification. At 1 email a day, the most you'll get in a year is 365. Since the first week of your year is 7 days less, your "year" of emails would be (365-7) = 358, and 358+48 (from the more active emails in the first week) = 406. If you sent 525,000 emails at 1 a day, you'd be sending a daily email for almost 1500 years! Your computer would be dust!

Peace,
JimBass

matisq · 12-29-2008, 02:00 AM

THX for the response! I'm going to test it now.

Quote:

Originally Posted by JimBass

Service_description is also fine, but some people reported that they need .* instead of * to affect all hosts.

"to affect all services" I think

Happy New Year!

matisq · 01-06-2009, 03:04 AM

Happy New Year!

I've been testing this approach durning Christmas but it fails. I receive messages every hour. Is this working even after Nagios restart?

I can use other solution. Now I have 70 hosts with 5 services on each host. I want to send report about each host everyday about midnight. I know how to send email every day but not every day at midnight.

Cheers!