Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
From the message I'm assuming this is a Nagios email you're getting?
If so the email is coming from your Nagios master and it is saying that the master can't ping the host in question which would appear to be named sdb0033. That is to say the email is NOT coming from the server that can't be pinged but rather the server which is checking it.
SERVICE FLAPPING ALERT: lvm0147;CHECKLOG;STARTED; Service appears to have started flapping (34.2% change >= 30.0% threshold)
I also get this alot but the only thing I could find on it was that a threshold might not be set correctly and that is why I get this error. There are many of those along with STOPPED rather than started.
These are content management clusters that I am checking out - and some of the time the content is unavailable by users and there are never any logs saying specific enough errors so I am looking through all the logs to see if I can find one that might be it...
The format of the message indicates to me that it was a HOST NOTIFICATION sent to contact, ITI-SERVER, saying that server sdb0033 could not be pinged. My assumption is that sdb0033 is NOT your Nagios server but rather a server that Nagios server tried to check. Are you saying that sdb0033 IS your Nagios master? If so it would be saying it couldn't ping itself which would be odd even if the network cable were disconnected.
Note that Master server is the one Nagios web page runs from NOT the one where you have NRPE, NSCLIENT or other Nagios client software installed. That is to say although you can install and monitor client software on other servers it is always monitored from the Nagios master so any emails sent would be from the master rather than from the client even though the email is talking about the client.
To help clear it up:
What is the name of your Nagios master server? What OS is it?
What is the name of the host where you are reading these logs? What OS is it?
What are the names of your cluster nodes? What OS is on them?
What is(are) your virtual cluster host name(s)?
From the Nagios master server what happens if you do a ping from command line of your virtual cluster host name?
From the Nagios master server what happens if you do a ping of each of your cluster nodes?
Did you get an email alert with the message? If so what does the header show? The header should include what host sent the email.
Flapping in Nagios means that a service being checked is going up and down. That is to say on one check it is OK then on next it isn't then on the following one it is etc... Essentially when a service starts flapping Nagios quits checking for a finite period of time to avoid wasting CPU cycles for checking something that isn't working right.