I wanted to post this here to help anyone that might have noticed any sort of performance issues with a RHEL 6 box. Im sure this issue will become a lot more prevelant when CentOS 6 comes out. It took me days of troubleshooting to figure this out and hopefully this will save a headache for others. These issues are also present on Fedora 10 and Fedora 11. I've seen quite a few forum posts on it already. (ie
http://www.linuxquestions.org/questi...a-11-a-778069/)
Skip to the bottom for the solution.
Here are the symptoms:
-ssh to the machine takes a long time before you finally get in. This usually points to DNS issues.
-dig and host succeed and resolve names very fast. (few ms)
-telnet to a port takes a longer than usual.
-Firefox is slow
-Yum is slow
Just on a hunch I disabled ipv6 and performance improved, but was still a lot slower than usual for certain applications.
After a bunch of troubleshooting here's what I discovered:
1. RHEL5 works perfectly. (ssh, telnet are fast)
2. this only happens when you make a DNS query through a firewall. If you have a DNS server on the same network segment, its super fast.
3. changing the timeout in /etc/resolv.conf helps, but its still not as fast as RHEL5 or Debian machines.
tcpdumps showed me that even with ipv6 disabled, AAAA queries were still happening for ssh, yum and whois. So something must have changed from RHEL5 to RHEL6.
Redhat's knowledge base was of no help what so ever.
After reading through hundreds of comments in multiple bug reports, I finally discovered the root cause:
Somewhere down the line, the maintainer for glibc decided to change the behavior of how the DNS resolver works. Now instead of opening a socket for each request, it uses the same socket for a A and AAAA request. A lot of hardware (firewalls, etc) gets confused and only sends back 1 reply. In return, your machine sits there and waits for the 2nd reply. (
https://bugzilla.redhat.com/show_bug.cgi?id=505105)
The glibc maintainers solution was to "fix the broken hardware". In an enterprise environment, I dont really have an option to upgrade my firewall today, even if theres a patch for it. I'm running a Juniper SSG-550 firewall, so Im sure many other people are having the same issues. And I have a bunch of enterprise RHEL 6 servers I need to deploy. The work around for it was to install a local dns caching server on the machine itself which is ludicrous. They're telling me that I have to install a local caching server on each and every RHEL 6 server I have? WTF?
(
https://bugzilla.redhat.com/show_bug.cgi?id=459756)
Finally he decided to implement a fallback option thats not really documented anywhere except here:
http://sourceware.org/ml/libc-alpha/.../msg00063.html
If two requests from the same port are not handled correctly close the socket and open a new one before sending the second request. The 'single-request-reopen' option in /etc/resolv.conf can be used to select this mode right away, instead of rediscovering the necessity is every process again.
So in short heres what I did and it resolved my issue:
place this In your /etc/resolv.conf
options single-request-reopen