LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Inherited Nagios box- email connection refused (https://www.linuxquestions.org/questions/linux-newbie-8/inherited-nagios-box-email-connection-refused-4175475174/)

digdougburns 08-29-2013 08:45 AM

Inherited Nagios box- email connection refused
 
Hi all,

I just started a new position and it appears that I'm now the Nagios admin. I'm fairly familiar with Linux. Moreso CentOS rather than the Ubuntu Server they're running here, but I'm managing alright.

Here's the issue. I discovered a couple of days ago that emails weren't being sent from Nagios. Looking back through the log files it seems this has been happening for months but no one knew/cared enough to attempt to fix it. Originally the problem was that /usr/bin/mail didn't exist at all. So I installed mailutils which got me the mail command and we were off to the races. Or so I thought. I'm currently getting most if not all of the host UP messages from Nagios, but rarely am I getting any host down messages. (I guess my Nagios box just likes to stay optimistic :D) Anyway, so in my nagios.log file there are several errors that look like this:

[1377783520] SERVICE NOTIFICATION: admin;[server];CPU_check;CRITICAL;notify-by-email;Connection refused

I don't see any information in the mail logs of any errors. I have confirmed that postfix is running and listening on the appropriate port. It's very sporadic. I can get all host UP messages without fail but I either don't get host down messages at all or very randomly. Any thoughts?

TenTenths 08-29-2013 09:14 AM

Nagios can be configured on a host by host / service by service level to send e-mails on the following conditions:

Down / Unreachable / Recovery / Flapping / Downtime

So it's possible that for some reason or another the "Down" notifications have been disabled.

Check your nagios config files to see what notifications are enabled for each host. Look for:

Code:

        notification_options            d,u,r
        notifications_enabled          1

This says that down / unreachable / recovery are enabled.

I'm sure there will be more things to try but this is a good start point.

Oh, and also check your server / client spam filters just incase your alerts are being classed as spam!

digdougburns 08-29-2013 09:26 AM

Quote:

Originally Posted by TenTenths (Post 5018258)
Nagios can be configured on a host by host / service by service level to send e-mails on the following conditions:

Down / Unreachable / Recovery / Flapping / Downtime

So it's possible that for some reason or another the "Down" notifications have been disabled.

Check your nagios config files to see what notifications are enabled for each host. Look for:

Code:

        notification_options            d,u,r
        notifications_enabled          1

This says that down / unreachable / recovery are enabled.

I'm sure there will be more things to try but this is a good start point.

Oh, and also check your server / client spam filters just incase your alerts are being classed as spam!

All hosts appear to have d,u,r set. Also services look like they have c,r set. I don't see anything caught in our server spam folder or my client folder. Seems like that connection refused has to be related in some way, no?

TenTenths 08-29-2013 09:37 AM

Quote:

Originally Posted by digdougburns (Post 5018265)
All hosts appear to have d,u,r set. Also services look like they have c,r set. I don't see anything caught in our server spam folder or my client folder. Seems like that connection refused has to be related in some way, no?

Not necessarily, in the log its recording:

[1377783520] - Timestamp
SERVICE NOTIFICATION: - Type Of Notification, HOST or SERVICE
admin; - Contact, send notification to this contact
[server]; - Host (Kind of self explanatory!)
CPU_check; - Service, the name of the particular check that failed.
CRITICAL; - Severity
notify-by-email; - Command used to send the notification.
Connection refused - Status/Results, this is the results of the Service check causing the notification, it's not the result of the notify-by-email command.

If it's host down notifications you're looking for then you'll need to grep your log file for HOST NOTIFICATION to see that nagios is actually detecting and trying to send you the notifications.

Hope this helps you understand what you're looking at / for in your log files.

digdougburns 08-29-2013 10:15 AM

Quote:

Originally Posted by TenTenths (Post 5018272)
Not necessarily, in the log its recording:

[1377783520] - Timestamp
SERVICE NOTIFICATION: - Type Of Notification, HOST or SERVICE
admin; - Contact, send notification to this contact
[server]; - Host (Kind of self explanatory!)
CPU_check; - Service, the name of the particular check that failed.
CRITICAL; - Severity
notify-by-email; - Command used to send the notification.
Connection refused - Status/Results, this is the results of the Service check causing the notification, it's not the result of the notify-by-email command.

If it's host down notifications you're looking for then you'll need to grep your log file for HOST NOTIFICATION to see that nagios is actually detecting and trying to send you the notifications.

Hope this helps you understand what you're looking at / for in your log files.

Wow, that is insanely helpful. I've been banging my head against this email "problem" for so long and never realized that it was actually a problem with the service. I disabled that broken service and it's basically fixed everything. I'm not getting flooded with alerts, I don't see the same errors, all, it seems, is well.

Now to figure out why that service is failing and I'll be in business! Thanks again so much for your help.

**EDIT** Looks like the listening daemon was removed from the host server. All is well and I've fixed the Nagios! Thanks for the help in explaining the error. I'll mark this as solved.

TenTenths 08-29-2013 10:35 AM

Quote:

Originally Posted by digdougburns (Post 5018290)
Looks like the listening daemon was removed from the host server.

Yup, that'll cause the No Connection errors ;) ;)
Quote:

Originally Posted by digdougburns (Post 5018290)
All is well and I've fixed the Nagios! Thanks for the help in explaining the error. I'll mark this as solved.

You're welcome!


All times are GMT -5. The time now is 07:50 PM.