Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
From the message I'm assuming this is a Nagios email you're getting?
If so the email is coming from your Nagios master and it is saying that the master can't ping the host in question which would appear to be named sdb0033. That is to say the email is NOT coming from the server that can't be pinged but rather the server which is checking it.
SERVICE FLAPPING ALERT: lvm0147;CHECKLOG;STARTED; Service appears to have started flapping (34.2% change >= 30.0% threshold)
I also get this alot but the only thing I could find on it was that a threshold might not be set correctly and that is why I get this error. There are many of those along with STOPPED rather than started.
These are content management clusters that I am checking out - and some of the time the content is unavailable by users and there are never any logs saying specific enough errors so I am looking through all the logs to see if I can find one that might be it...
The format of the message indicates to me that it was a HOST NOTIFICATION sent to contact, ITI-SERVER, saying that server sdb0033 could not be pinged. My assumption is that sdb0033 is NOT your Nagios server but rather a server that Nagios server tried to check. Are you saying that sdb0033 IS your Nagios master? If so it would be saying it couldn't ping itself which would be odd even if the network cable were disconnected.
Note that Master server is the one Nagios web page runs from NOT the one where you have NRPE, NSCLIENT or other Nagios client software installed. That is to say although you can install and monitor client software on other servers it is always monitored from the Nagios master so any emails sent would be from the master rather than from the client even though the email is talking about the client.
To help clear it up:
What is the name of your Nagios master server? What OS is it?
What is the name of the host where you are reading these logs? What OS is it?
What are the names of your cluster nodes? What OS is on them?
What is(are) your virtual cluster host name(s)?
From the Nagios master server what happens if you do a ping from command line of your virtual cluster host name?
From the Nagios master server what happens if you do a ping of each of your cluster nodes?
Did you get an email alert with the message? If so what does the header show? The header should include what host sent the email.
Flapping in Nagios means that a service being checked is going up and down. That is to say on one check it is OK then on next it isn't then on the following one it is etc... Essentially when a service starts flapping Nagios quits checking for a finite period of time to avoid wasting CPU cycles for checking something that isn't working right.
Note that the format of the message after HOST NOTIFICATION is: Contact the email is being sent to followed by the host that is being checked.
So for example the first 3 log entries are actually all about the same host (ldb0126) but are being sent to 3 different contacts: ITI-UNIX, ITI-SERVER and pager.
Quote:
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-UNIX;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: ITI-SERVER;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
Dec 26 15:05:31 lap0100 nagios: HOST NOTIFICATION: pager;ldb0126;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%
I'm guessing "lap0100" is the name of the Nagios master that had the logs.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.