Host Notification

jorran · 01-03-2012, 11:50 AM

Every hour I get this msg

Dec 26 06:00:01 lap0100 nagios: HOST NOTIFICATION: ITI-SERVER;sdb0033;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%

any reason why its packet loss = 100%? Its sending out notifications and should be recieving them right?

MensaWater · 01-03-2012, 12:22 PM

From the message I'm assuming this is a Nagios email you're getting?

If so the email is coming from your Nagios master and it is saying that the master can't ping the host in question which would appear to be named sdb0033. That is to say the email is NOT coming from the server that can't be pinged but rather the server which is checking it.

jorran · 01-03-2012, 01:47 PM

So, the server that is checking it (sdb0033) cant be pinged? Or am I reading that wrong?

jorran · 01-03-2012, 01:52 PM

SERVICE FLAPPING ALERT: lvm0147;CHECKLOG;STARTED; Service appears to have started flapping (34.2% change >= 30.0% threshold)

I also get this alot but the only thing I could find on it was that a threshold might not be set correctly and that is why I get this error. There are many of those along with STOPPED rather than started.

These are content management clusters that I am checking out - and some of the time the content is unavailable by users and there are never any logs saying specific enough errors so I am looking through all the logs to see if I can find one that might be it...

MensaWater · 01-03-2012, 03:25 PM

The format of the message indicates to me that it was a HOST NOTIFICATION sent to contact, ITI-SERVER, saying that server sdb0033 could not be pinged. My assumption is that sdb0033 is NOT your Nagios server but rather a server that Nagios server tried to check. Are you saying that sdb0033 IS your Nagios master? If so it would be saying it couldn't ping itself which would be odd even if the network cable were disconnected.

Note that Master server is the one Nagios web page runs from NOT the one where you have NRPE, NSCLIENT or other Nagios client software installed. That is to say although you can install and monitor client software on other servers it is always monitored from the Nagios master so any emails sent would be from the master rather than from the client even though the email is talking about the client.

To help clear it up:
What is the name of your Nagios master server? What OS is it?
What is the name of the host where you are reading these logs? What OS is it?
What are the names of your cluster nodes? What OS is on them?
What is(are) your virtual cluster host name(s)?
From the Nagios master server what happens if you do a ping from command line of your virtual cluster host name?
From the Nagios master server what happens if you do a ping of each of your cluster nodes?
Did you get an email alert with the message? If so what does the header show? The header should include what host sent the email.

Flapping in Nagios means that a service being checked is going up and down. That is to say on one check it is OK then on next it isn't then on the following one it is etc... Essentially when a service starts flapping Nagios quits checking for a finite period of time to avoid wasting CPU cycles for checking something that isn't working right.

jorran · 01-03-2012, 03:39 PM

My main problem is that I do not have access to any of these servers.. well i have just user access through putty but it doesnt do me any good really.

There are several of the Host Notification msgs in the log that was pulled for me and all different servers:

Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-UNIX;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-SERVER;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: pager;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-UNIX;ldb0285n1;DOWN;host-notify-by-email;CRITICAL - Host Unreachable (10.8.10.158)
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-SERVER;ldb0285n1;DOWN;host-notify-by-email;CRITICAL - Host Unreachable (10.8.10.158)
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: pager;ldb0285n1;DOWN;host-notify-by-email;CRITICAL - Host Unreachable (10.8.10.158)
Dec 26 15:06:01 lap0100 nagios: HOST NOTIFICATION: ITI-UNIX;lap0498;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:06:01 lap0100 nagios: HOST NOTIFICATION: ITI-SERVER;lap0498;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:06:02 lap0100 nagios: HOST NOTIFICATION: pager;lap0498;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:06:31 lap0100 nagios: HOST NOTIFICATION: ITI-UNIX;lap0499;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:06:31 lap0100 nagios: HOST NOTIFICATION: ITI-SERVER;lap0499;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:06:31 lap0100 nagios: HOST NOTIFICATION: pager;lap0499;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%

Ultimately I am trying to figure out how big of a problem this could be... I will hopefully have the info you asked for about Nagios OS info.

MensaWater · 01-03-2012, 04:02 PM

Note that the format of the message after HOST NOTIFICATION is: Contact the email is being sent to followed by the host that is being checked.

So for example the first 3 log entries are actually all about the same host (ldb0126) but are being sent to 3 different contacts: ITI-UNIX, ITI-SERVER and pager.

Quote:

Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-UNIX;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-SERVER;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: pager;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%

I'm guessing "lap0100" is the name of the Nagios master that had the logs.