slow dns on rhel6 with ipv6 going through a firewall (yum ssh firefox)

iambrucelee · 06-15-2011, 01:52 PM

I wanted to post this here to help anyone that might have noticed any sort of performance issues with a RHEL 6 box. Im sure this issue will become a lot more prevelant when CentOS 6 comes out. It took me days of troubleshooting to figure this out and hopefully this will save a headache for others. These issues are also present on Fedora 10 and Fedora 11. I've seen quite a few forum posts on it already. (ie http://www.linuxquestions.org/questi...a-11-a-778069/)

Skip to the bottom for the solution.

Here are the symptoms:

-ssh to the machine takes a long time before you finally get in. This usually points to DNS issues.
-dig and host succeed and resolve names very fast. (few ms)
-telnet to a port takes a longer than usual.
-Firefox is slow
-Yum is slow

Just on a hunch I disabled ipv6 and performance improved, but was still a lot slower than usual for certain applications.

After a bunch of troubleshooting here's what I discovered:

1. RHEL5 works perfectly. (ssh, telnet are fast)
2. this only happens when you make a DNS query through a firewall. If you have a DNS server on the same network segment, its super fast.
3. changing the timeout in /etc/resolv.conf helps, but its still not as fast as RHEL5 or Debian machines.

tcpdumps showed me that even with ipv6 disabled, AAAA queries were still happening for ssh, yum and whois. So something must have changed from RHEL5 to RHEL6.

Redhat's knowledge base was of no help what so ever.

After reading through hundreds of comments in multiple bug reports, I finally discovered the root cause:

Somewhere down the line, the maintainer for glibc decided to change the behavior of how the DNS resolver works. Now instead of opening a socket for each request, it uses the same socket for a A and AAAA request. A lot of hardware (firewalls, etc) gets confused and only sends back 1 reply. In return, your machine sits there and waits for the 2nd reply. (https://bugzilla.redhat.com/show_bug.cgi?id=505105)

The glibc maintainers solution was to "fix the broken hardware". In an enterprise environment, I dont really have an option to upgrade my firewall today, even if theres a patch for it. I'm running a Juniper SSG-550 firewall, so Im sure many other people are having the same issues. And I have a bunch of enterprise RHEL 6 servers I need to deploy. The work around for it was to install a local dns caching server on the machine itself which is ludicrous. They're telling me that I have to install a local caching server on each and every RHEL 6 server I have? WTF?
(https://bugzilla.redhat.com/show_bug.cgi?id=459756)

Finally he decided to implement a fallback option thats not really documented anywhere except here: http://sourceware.org/ml/libc-alpha/.../msg00063.html

If two requests from the same port are not handled correctly close the socket and open a new one before sending the second request. The 'single-request-reopen' option in /etc/resolv.conf can be used to select this mode right away, instead of rediscovering the necessity is every process again.

So in short heres what I did and it resolved my issue:

place this In your /etc/resolv.conf

options single-request-reopen

MensaWater · 06-15-2011, 03:23 PM

Thanks for posting this. I hadn't run into the issue yet but suspect I would eventually.

Please go to Thread Tools at top and mark this solved. It helps others when doing web searches to find resolved issues more quickly since it will show up in the thread title.

rryder · 06-29-2011, 12:48 PM

Hi! Thanks for pointing out this wasn't in the Red Hat kbase! I think customers will benefit from this, so please check out https://access.redhat.com/kb/docs/DOC-58626

erinn · 06-29-2011, 03:16 PM

I put a bug in a kernel.org bugzilla report requesting documentation for the option in resolv.conf here: https://bugzilla.kernel.org/show_bug.cgi?id=38542

As well, I opened a request with Red Hat to include said documentation here: https://bugzilla.redhat.com/show_bug.cgi?id=717770

Thanks a lot for posting this up, I had been struggling with this issue as well.

-Erinn

salasi · 06-29-2011, 04:46 PM

Quote:

Originally Posted by iambrucelee

Here are the symptoms:

-ssh to the machine takes a long time before you finally get in. This usually points to DNS issues.
-dig and host succeed and resolve names very fast. (few ms)
-telnet to a port takes a longer than usual.
-Firefox is slow
-Yum is slow

Thank you very much for posting that. I've noticed a certain unfortunate interaction between some browsers and squid, which seems to have some elements in common with what you have - not the same, but similar effects - so your work gives me somewhere to start. I hadn't even thought of glibc, to be honest.

Quote:

The work around for it was to install a local dns caching server on the machine itself which is ludicrous. They're telling me that I have to install a local caching server on each and every RHEL 6 server I have?

Presumably, they are saying that you need one DNS cache on the 'local' side of the firewall, rather than one on each machine? Still a work-around for a broken situation, though.

crnkyadm · 02-24-2012, 08:31 AM

Hey, I just wanted to let you know that this happens even when there is no firewall between client and server. I'm building a Oracle Linux 6.2 (RHEL knock-off like CentOS) virtual machine running on ESXi 4.1 with Cisco Nexus network infrastructure and had this issue. About the only thing that I can think of is that our DNS servers are behind Cisco load-balancers, so the LB may be tripping up on the traffic like the firewalls seem to. I changed the resolv.conf to point directly to the DNS servers and the result seems to support the idea that the LB is getting tripped up.

crackptb · 09-10-2013, 10:38 AM

I for a change, have found issue sligthly elswhere... Not on the server I am trying to access but Linux box I am using everyday.
I have found that delay was caused by GSSAPI authentication method on SSH local client. To resolve the issue I have edited /etc/ssh/ssh_config and updated line -> GSSAPIAuthentication no

This mod solved the speed issue for me as I use only ssh key exchange or manually typed passwords.