Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm sorry if this is in the wrong forum, I don't think it is, but we'll see.
Here's my scenario that I'm trying to address...
I have a RHEL Enterprise 5.4 server used as our Nagios server which monitors a CentOS 5.3 box without any problems under normal daily operations.
However, this CentOS box is setup using LDAP to authenticate to one of our Windows DC's for Active Directory authentication. Again, this works great normally, until I have to reboot the DC then all hell breaks loose.
Originally, the configuration of our ldap is that it points to one URi LDAP server, and I thought that the easiest way to deal with this would be to add another LDAP server and call it a day, but this didn't seem to work for me.
Here's what my CentOS logs look like:
Feb 2 11:52:07 wd-54 httpd: nss_ldap: failed to bind to LDAP server
ldap://blah.blah.blah.com: Can't contact LDAP server
This continues on for a bit sleeping along the way...
Then Nagios decides that it's going to start checks up again...
Feb 2 11:54:43 wd-54 xinetd[2235]: START: nrpe pid=20092 from=x.x.x.x (Nagios Server IP Address)
It performs 9 additional checks and then gets to.
Feb 2 11:54:54 wd-54 xinetd[2235]: FAIL: nrpe per_source_limit from=x.x.x.x (again Nagios server IP)
Feb 2 11:55:43 wd-54 xinetd[2235]: FAIL: nrpe per_source_limit from=x.x.x.x
Feb 2 11:55:47 wd-54 httpd: nss_ldap: could not search LDAP server - Server is unavailable
Finally, the LDAP server comes back online...
Feb 2 11:56:45 wd-54 httpd: nss_ldap: reconnected to LDAP server ldap://blah.blah.blah.com after 2 attempts
Feb 2 11:56:58 wd-54 nrpe[20092]: Error: Could not complete SSL handshake. 5
(I don't know why the 5 is in the log.)
The SSL handshake line repeats until NRPE realizes that LDAP is back up and then goes back to normal.
I just can't seem to understand why loosing the LDAP server is having such an impact on Nagios. The only thing that LDAP is configured for is logins, so I guess this isn't making sense.
Thanks in advance, if there's anything else that I haven't included from my logs that might help, please let me know.
By any chance is the user that nrpe runs as authenticated via LDAP? If not you may need to add that user to the nss_initgroups_ignoreusers in your /etc/ldap.conf file. If that user is authenticated via LDAP this may well be the source of the problem. I've noticed several bizarre problems of this nature when an LDAP server goes away. You might also want to look at what the maximum nubmer of connections for a given service is in your xinetd config.
I don't believe that nrpe runs through LDAP, there is a local user on the box so as far as I know it shouldn't. I added the nagios user to the /etc/ldap.conf file as well as setup a secondary LDAP server that we could fallback to if needed. Actually, I didn't realize it until well after the fact, but the primary LDAP server was rebooted and the secondary picked up the slack without any problems reported by Nagios.
Last week, I reset my connections to unlimited thinking that would help me (it didn't appear to).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.