LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-14-2011, 06:44 AM   #1
AST
LQ Newbie
 
Registered: Oct 2011
Posts: 4

Rep: Reputation: Disabled
Anybody familiar with Nagios Escalations?


Hi,

I have my nagios install now working fine - emailing me and SMS alerting me to problems.

This is what i now want to achieve and i believe escalations are the way to go?

At the moment i am alerted to a down host after 3 minutes via email and then am notified via email every minute until the host is back up. (Taken from HOST TEMPLATES part of my templates.cfg)

Check_interval - 1
retry_interval - 1
Max_check_attempts - 2
notification_interval - 1
notification_options - d,r

I monitor all my hosts (about 70) by PING only - i have host groups set up but am not using any service groups.

Now i am happy for an email to be sent to me every minute until the host recovers but to keep SMS costs down i dont want to be SMS alerted every minute!!!!!

Id like:

1. 1st email to be sent out after 3 minutes and then every 1 minute until recovery (like it is now)
2. An SMS to be sent out the same time as the first host down email is sent
3. Then an SMS alert to be sent again 3 and 6 minutes after the first notification.
4. And then no more SMS sent until the host is recovered. (me or my other admin should be working on the problem after 3 SMS alerts!)

How can i acheive this please?

I have to create an escalations.cfg file and then add that file location to the main nagios.cfg.

What do i put in that file though?

I have read up about escalations but i really am uncertain on how to deal with this.

Thanks in advance
 
Old 11-28-2011, 03:23 AM   #2
zedmelon
Member
 
Registered: Jun 2004
Location: colorado, USA
Distribution: slack, oBSD
Posts: 119

Rep: Reputation: 24
Quote:
Originally Posted by AST View Post
Code:
Check_interval - 1
retry_interval - 1
Max_check_attempts - 2
notification_interval - 1
notification_options - d,r
You've configured Nagios to alert you upon recovery (good), so you'll get an alert saying all is well. This is also useful if you would have traveled ten miles to fix a problem which recovers before you've driven two blocks.
Quote:
Originally Posted by AST View Post
1. 1st email to be sent out after 3 minutes and then every 1 minute until recovery (like it is now)
Careful... you don't want to go insane with 180 texts "reminding" you a host is down. Saving SMS money is good, but after half the night awake with ten text messages, you won't care about your wireless invoice as much as your pillow. This is not a "set-it-and-forget-it" tool--it's very hands-on.

Nagios is my buddy; it has been watching my cable modem for about four years. I liked it so much I built and maintained a Nagios server (around 200 hosts, lots of ping, some telnet, SNMP, ssh) at my last job.

First--this is easier to configure than escalation--increase your notification interval. Certainly, critical systems might require a lower interval (our UPSes were 5 min), but the interval is to give you time to log into Nagios (not nag you constantly that something is still broken). Then you can acknowledge the service issue. Notice I didn't say "disable" the alert; disabling a notification is a VERY rare need. The "acknowledge" button is key to a successful Nagios installation. You should acknowledge every alert as opposed to waiting; the very idea of monitoring is to help you become more proactive in providing a network service. It's Nagios's job to tell you there's a problem, and yours is to fix what's needed. Whether you handle it now or in the morning, Nagios will notice once the problem is gone or gets worse.

Quote:
Originally Posted by AST View Post
2. An SMS to be sent out the same time as the first host down email is sent
3. Then an SMS alert to be sent again 3 and 6 minutes after the first notification.
4. And then no more SMS sent until the host is recovered. (me or my other admin should be working on the problem after 3 SMS alerts!)
You can define multiple contacts for a service/host/group, including a real email account or a pager or phone. Each contact can (I believe) have a different notification interval. Most wireless providers have an email portal, which is simpler to configure than SMS. Nagios sends my Verizon phone a short message by mailing 1234567890@vtext.com.**

Note that Nagios is precisely the sort of uber-cool project that be "over-geeked." If you're not going to do anything about an alert, consider whether you really need it. The first time you receive a page at 3am when a printer goes offline is a geeky thrill, but that wears off.
:-)

Let Nagios do the boring part for you by watching services and hosts, but then resolve to react when it does. If you get a notification every few minutes, you become desensitized to the very alerts which are designed to help you fix a problem quickly, so no one knew about it but you.

Other notes: Planned downtime is a great way to avoid getting a ton of pages for an OS upgrade or replacing a router. The parent/child relationships are well worth a look if you have multiple switches and routers, but not really for a flat LAN. The documentation that comes with Nagios is extensive; you'll see plenty on escalation there.

There's also an Android Nagios client, but I haven't gotten to try it yet. Finally, congrats on deploying Nagios. It's a complex beast, but very powerful and completely worth it.

** my real number. Go ahead, try it.

Last edited by zedmelon; 11-28-2011 at 03:24 AM. Reason: to redundantly not keep redundantly repeating redundantly
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Nagios and Oreon (Nagios web front end) installation and Configuration LXer Syndicated Linux News 1 05-31-2016 07:26 AM
ould not create external command file '/usr/local/nagios/var/rw/nagios.cmd gerard.zapata Linux - Newbie 2 09-14-2012 01:57 PM
LXer: April Nagios Training Dates Include Advanced Nagios Class LXer Syndicated Linux News 0 03-18-2011 09:30 AM
Fail to install Nagios Pluggin nagios-plugins-1.4.15.tar.gz fred_xlf Linux - Software 1 11-27-2010 06:09 AM
LXer: Nagios 2.5 and Oreon 1.3 (Nagios web front end) installation with screenshots LXer Syndicated Linux News 0 08-11-2006 05:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration