LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-24-2012, 07:02 PM   #1
joed@ucsf.edu
LQ Newbie
 
Registered: May 2012
Posts: 5

Rep: Reputation: Disabled
Nagios stopped monitoring 5 VM Linux servers


I have a weird problem in that I suddenly can't monitor 5 of my nagios machines. These 5 client machines are all running RHEL 5, and are virtual machines under VMWare. They all stopped working around the same period. My /var/log/messages simply says "Cannot connect from xxx.xxx.xxx.xxx" (real IP replaced). From my nagios server, I simply get "CHECK_NRPE: Error - Could not complete SSL handshake"
I can telnet to port 5666 on the client machine from the nagios server, though it disconnects after 3-5 seconds. I've verified that 5666 is being listened for on the client machines. I've tried re-installing on the client machines, and rebooting the client servers. There weren't any changes or patching done on those servers, and the router folk and firewall people all say everything looks ok from their end. I've disabled IPtables and SELinux, but I still can't get anything from nagios. The client machines are on different VLANs in our network, and other VM machines on those networks are working correctly.

I'd love to hear from folks on ways to help diagnose the problem. I don't think it's on the clients or nagios server, and suspect it's something in the firewall or network end at our border, but I'm not sure how to proceed. Any suggestions will be greatly appreciated.
 
Old 09-24-2012, 07:27 PM   #2
roreilly
Member
 
Registered: Aug 2006
Location: Canada
Distribution: Debian, Slackware
Posts: 106

Rep: Reputation: 28
Hello Joed,

You specify that no updates or patches were applied to the 5 client machines. What about the nagios server(s)?

That error is generally caused by mismatched nagios versions.

Can you run check_nrpe manually against the vm's & see what the output is? If you get the same error, it's most likely
a version mismatch.

Last edited by roreilly; 09-25-2012 at 10:56 AM.
 
Old 09-25-2012, 06:53 PM   #3
joed@ucsf.edu
LQ Newbie
 
Registered: May 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hello, Roreilly
No, the server hasn't changed or had patches applied, and the versions of nagios are the same. I ran the check_nrpe on the server and still get the "CHECK_NRPE: Error - Could not complete SSL handshake." I seem to get this error when I can't connect to the client machine, such as when the nrpe.cfg isn't running on the client machine, but I've confirmed that it's running, and port 5666 is being listened for.
 
Old 09-25-2012, 11:00 PM   #4
pardoxx
LQ Newbie
 
Registered: Sep 2006
Posts: 17

Rep: Reputation: 1
firewall ? were you enable the firewall on the server?
 
Old 09-25-2012, 11:47 PM   #5
joed@ucsf.edu
LQ Newbie
 
Registered: May 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
The firewall folk said things are ok on their end. I've turned off the firewall on the client machine and even turned off SELinux. I'm still having a problem.
 
Old 09-26-2012, 02:32 AM   #6
sackboy
LQ Newbie
 
Registered: Sep 2012
Posts: 20

Rep: Reputation: Disabled
From the nagios machine run:
telnet one_of_the_vm_ips 5666

What's the result?
 
Old 09-26-2012, 08:51 AM   #7
roreilly
Member
 
Registered: Aug 2006
Location: Canada
Distribution: Debian, Slackware
Posts: 106

Rep: Reputation: 28
Can you verify that /etc/xinetd.d/nrpe has not been changed?
Verify that the only_from IP is correct.

It should look like this:


# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 10.1.2.2
}

Then if you can, restart xinetd.
 
Old 09-26-2012, 01:34 PM   #8
joed@ucsf.edu
LQ Newbie
 
Registered: May 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hi, Sackboy,
Telnet'ing to port 5666 gives me a "Connected to" the server message, but then breaks the connection after about 3 to 5 seconds. Telnet'ing to one of the good machines stays connected until I issue a quit command or an interrupt.
 
Old 09-28-2012, 03:19 PM   #9
sackboy
LQ Newbie
 
Registered: Sep 2012
Posts: 20

Rep: Reputation: Disabled
Hi,

On the machine that says "Connected to" and drops out, tail /var/log/secure. What does it show?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Monitoring OSX and Linux with Nagios and SNMP LXer Syndicated Linux News 0 04-23-2012 06:10 AM
Monitoring oracle database on windows using Nagios(Linux) indrajit.jadhav Linux - General 3 10-31-2011 06:25 AM
LXer: Monitoring with Nagios: New Online Training from the Linux Magazine Academy LXer Syndicated Linux News 0 11-10-2010 05:10 PM
NagiOS Remote Linux monitoring problem jaychoksi2003 General 3 09-03-2009 04:56 AM
Monitoring Linux Servers bebeslb Linux - Enterprise 1 06-08-2006 07:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration