-   Linux - Server (
-   -   nrpe ldap and ssl (

scottrych 02-05-2010 09:45 AM

nrpe ldap and ssl

I'm sorry if this is in the wrong forum, I don't think it is, but we'll see.

Here's my scenario that I'm trying to address...

I have a RHEL Enterprise 5.4 server used as our Nagios server which monitors a CentOS 5.3 box without any problems under normal daily operations.

However, this CentOS box is setup using LDAP to authenticate to one of our Windows DC's for Active Directory authentication. Again, this works great normally, until I have to reboot the DC then all hell breaks loose.

Originally, the configuration of our ldap is that it points to one URi LDAP server, and I thought that the easiest way to deal with this would be to add another LDAP server and call it a day, but this didn't seem to work for me.

Here's what my CentOS logs look like:

Feb 2 11:52:07 wd-54 httpd: nss_ldap: failed to bind to LDAP server
ldap:// Can't contact LDAP server

This continues on for a bit sleeping along the way...

Then Nagios decides that it's going to start checks up again...

Feb 2 11:54:43 wd-54 xinetd[2235]: START: nrpe pid=20092 from=x.x.x.x (Nagios Server IP Address)

It performs 9 additional checks and then gets to.

Feb 2 11:54:54 wd-54 xinetd[2235]: FAIL: nrpe per_source_limit from=x.x.x.x (again Nagios server IP)

Feb 2 11:55:43 wd-54 xinetd[2235]: FAIL: nrpe per_source_limit from=x.x.x.x

Feb 2 11:55:47 wd-54 httpd: nss_ldap: could not search LDAP server - Server is unavailable

Finally, the LDAP server comes back online...

Feb 2 11:56:45 wd-54 httpd: nss_ldap: reconnected to LDAP server ldap:// after 2 attempts

Feb 2 11:56:58 wd-54 nrpe[20092]: Error: Could not complete SSL handshake. 5
(I don't know why the 5 is in the log.)

The SSL handshake line repeats until NRPE realizes that LDAP is back up and then goes back to normal.

I just can't seem to understand why loosing the LDAP server is having such an impact on Nagios. The only thing that LDAP is configured for is logins, so I guess this isn't making sense.

Thanks in advance, if there's anything else that I haven't included from my logs that might help, please let me know.



btmiller 02-07-2010 12:37 PM

By any chance is the user that nrpe runs as authenticated via LDAP? If not you may need to add that user to the nss_initgroups_ignoreusers in your /etc/ldap.conf file. If that user is authenticated via LDAP this may well be the source of the problem. I've noticed several bizarre problems of this nature when an LDAP server goes away. You might also want to look at what the maximum nubmer of connections for a given service is in your xinetd config.

scottrych 02-08-2010 03:14 PM

Thanks btmiller,

I don't believe that nrpe runs through LDAP, there is a local user on the box so as far as I know it shouldn't. I added the nagios user to the /etc/ldap.conf file as well as setup a secondary LDAP server that we could fallback to if needed. Actually, I didn't realize it until well after the fact, but the primary LDAP server was rebooted and the secondary picked up the slack without any problems reported by Nagios.

Last week, I reset my connections to unlimited thinking that would help me (it didn't appear to).

Thanks for your help.


ursusca 02-10-2010 12:33 AM


Try to update the service’s xinetd config. In this case, /etc/xinetd.d/nrpe

service nrpe
per_source = UNLIMITED
instances = UNLIMITED

restart xinetd.

All times are GMT -5. The time now is 06:06 PM.