Inherited Nagios box- email connection refused
Hi all,
I just started a new position and it appears that I'm now the Nagios admin. I'm fairly familiar with Linux. Moreso CentOS rather than the Ubuntu Server they're running here, but I'm managing alright. Here's the issue. I discovered a couple of days ago that emails weren't being sent from Nagios. Looking back through the log files it seems this has been happening for months but no one knew/cared enough to attempt to fix it. Originally the problem was that /usr/bin/mail didn't exist at all. So I installed mailutils which got me the mail command and we were off to the races. Or so I thought. I'm currently getting most if not all of the host UP messages from Nagios, but rarely am I getting any host down messages. (I guess my Nagios box just likes to stay optimistic :D) Anyway, so in my nagios.log file there are several errors that look like this: [1377783520] SERVICE NOTIFICATION: admin;[server];CPU_check;CRITICAL;notify-by-email;Connection refused I don't see any information in the mail logs of any errors. I have confirmed that postfix is running and listening on the appropriate port. It's very sporadic. I can get all host UP messages without fail but I either don't get host down messages at all or very randomly. Any thoughts? |
Nagios can be configured on a host by host / service by service level to send e-mails on the following conditions:
Down / Unreachable / Recovery / Flapping / Downtime So it's possible that for some reason or another the "Down" notifications have been disabled. Check your nagios config files to see what notifications are enabled for each host. Look for: Code:
notification_options d,u,r I'm sure there will be more things to try but this is a good start point. Oh, and also check your server / client spam filters just incase your alerts are being classed as spam! |
Quote:
|
Quote:
[1377783520] - Timestamp SERVICE NOTIFICATION: - Type Of Notification, HOST or SERVICE admin; - Contact, send notification to this contact [server]; - Host (Kind of self explanatory!) CPU_check; - Service, the name of the particular check that failed. CRITICAL; - Severity notify-by-email; - Command used to send the notification. Connection refused - Status/Results, this is the results of the Service check causing the notification, it's not the result of the notify-by-email command. If it's host down notifications you're looking for then you'll need to grep your log file for HOST NOTIFICATION to see that nagios is actually detecting and trying to send you the notifications. Hope this helps you understand what you're looking at / for in your log files. |
Quote:
Now to figure out why that service is failing and I'll be in business! Thanks again so much for your help. **EDIT** Looks like the listening daemon was removed from the host server. All is well and I've fixed the Nagios! Thanks for the help in explaining the error. I'll mark this as solved. |
Quote:
Quote:
|
All times are GMT -5. The time now is 07:50 PM. |