I am having problems troubleshooting a problem where LDAP is killing our network connection. I can not figure what application is querrying the LDAP server.
I add ldap to nsswitch.conf file for password, shadow, and group and tcpdump shows the network connection is suddenly overloaded with ldap requests, but what application could be perpetrating such requests? I have other boxes with the same configuartion not having the issue. How can I determine what application is putting out a flood of ldap requests. litterally hundreds per second.
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 1 0 burke.some.net:56611 chicago.some.net:ldap CLOSE_WAIT 4913/ntpd
tcp 1 0 burke.some.net:56613 chicago.some.net:ldap CLOSE_WAIT 5072/xfs
tcp 0 0 burke.some.net:34664 linux06.some.net:ssh ESTABLISHED 7846/ssh
tcp 0 0 burke.some.net:ssh linux01.some.net:40227 ESTABLISHED 6222/2
tcp 0 0 burke.some.net:ssh linux01.some.net:40159 ESTABLISHED 6099/1
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ] DGRAM 1430 560/udevd @/org/kernel/udev/udevd
unix 2 [ ] DGRAM 8736 5115/hald @/org/freedesktop/hal/udev_event
unix 17 [ ] DGRAM 7410 4639/syslogd /dev/log
unix 2 [ ] DGRAM 13773 6222/2
unix 2 [ ] DGRAM 13493 6099/1
unix 3 [ ] STREAM CONNECTED 11992 5766/gam_server @/tmp/fam-root-
unix 3 [ ] STREAM CONNECTED 11991 5764/python
unix 3 [ ] STREAM CONNECTED 11982 4762/dbus-daemon /var/run/dbus/system_bus_socket
unix 3 [ ] STREAM CONNECTED 11981 5764/python
unix 2 [ ] DGRAM 11823 5695/hidd
unix 2 [ ] DGRAM 11769 5666/pcscd
unix 3 [ ] STREAM CONNECTED 11708 4762/dbus-daemon /var/run/dbus/system_bus_socket
unix 3 [ ] STREAM CONNECTED 11707 5605/hcid
unix 2 [ ] DGRAM 11687 5611/sdpd
unix 2 [ ] DGRAM 11677 5605/hcid
unix 2 [ ] STREAM CONNECTED 10441 4853/acpid /var/run/acpid.socket
unix 3 [ ] STREAM CONNECTED 10006 4762/dbus-daemon /var/run/dbus/system_bus_socket
unix 3 [ ] STREAM CONNECTED 10005 5115/hald
unix 3 [ ] STREAM CONNECTED 9983 5115/hald @/var/run/hald/dbus-PjmpnLiR3E
unix 3 [ ] STREAM CONNECTED 9982 5138/hdb
unix 3 [ ] STREAM CONNECTED 9842 5115/hald @/var/run/hald/dbus-PjmpnLiR3E
unix 3 [ ] STREAM CONNECTED 9840 5130/event0
unix 3 [ ] STREAM CONNECTED 9820 5115/hald @/var/run/hald/dbus-PjmpnLiR3E
unix 3 [ ] STREAM CONNECTED 9819 5126/event1
unix 3 [ ] STREAM CONNECTED 9767 4853/acpid /var/run/acpid.socket
unix 3 [ ] STREAM CONNECTED 9766 5123/acpid.socket
unix 3 [ ] STREAM CONNECTED 9758 5115/hald @/var/run/hald/dbus-PjmpnLiR3E
unix 3 [ ] STREAM CONNECTED 9757 5123/acpid.socket
unix 3 [ ] STREAM CONNECTED 8731 5115/hald @/var/run/hald/dbus-vfJAlB8GNU
unix 3 [ ] STREAM CONNECTED 8730 5116/hald-runner
unix 2 [ ] DGRAM 8556 5041/crond
unix 2 [ ] DGRAM 8531 5030/gpm
unix 2 [ ] DGRAM 8508 5018/clientmqueue
unix 2 [ ] DGRAM 8479 5010/sendmail: acce
unix 2 [ ] DGRAM 8097 4913/ntpd
unix 2 [ ] DGRAM 7960 4864/hpiod
unix 2 [ ] DGRAM 7897 4832/automount
unix 3 [ ] STREAM CONNECTED 7727 4762/dbus-daemon
unix 3 [ ] STREAM CONNECTED 7726 4762/dbus-daemon
unix 3 [ ] STREAM CONNECTED 7681 4743/rpc.idmapd
unix 3 [ ] STREAM CONNECTED 7680 4743/rpc.idmapd
unix 2 [ ] DGRAM 7543 4704/rpc.statd
unix 2 [ ] DGRAM 7418 4642/klogd
unix 3 [ ] STREAM CONNECTED 7361 4613/auditd
unix 3 [ ] STREAM CONNECTED 7360 4615/audispd
Another symptom is that when ldap is in the nsswitch.conf file it takes tcpdump 20 seconds to start. Without ldap in the nsswitch.conf file it starts instantaneously.
LDAP slowing network down
CS on Friday, March 16, 2007 at 12:47 am
Fixed it ! The culprit was “LDAP referrals”. Having just spent several hours figuring it out, this actually refreshed my memory that it was the same problem – and its cause – that stopped me implementing this the last time (which, in my defense, was 18 – 24 months ago, now that I go back and search my emails, not 12 months).
We have a slightly complicated domain structure in that we have multiple child domains (for each branch office). With referrals turned on, nss_ldap recurses down from domain.com to child1.domain.com, child2.domain.com, etc before returning results. There must be something that happens in this process that sometimes triggers the errors I was seeing (because they appear at random) and also causing the slowdowns.
If you put:
into /etc/ldap.conf, the problems all go away (or, at least, they haven’t happened since – with these sort of random errors you never can be 100% sure ;) ). Response time is also substantially faster.
I can get away with this _now_ because (for unrelated reasons) we are flattening our domain structure and nixing the child domains. Previously this was not an option so any authentication system had to be able to include users in the child domains.
I also seem to vaguely recall that you can get around much of the slowness related to multi-domain configurations by a) pointing at a DC that is a Global Catalogue and b) pointing at the GC port instead of the regular LDAP port. This didn’t help the random crashes with things like ‘getent group’ though, IIRC.
Well, now all I need to do is get the automounter going with NFS’d home directories, and we’ll be golden. I might have to bend your ear about that shortly as well ;) .
|All times are GMT -5. The time now is 01:42 PM.|