LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 05-08-2009, 10:56 AM   #1
Earthworm Jim
LQ Newbie
 
Registered: May 2009
Posts: 3

Rep: Reputation: 0
DNS resolution timeouts / retries


I'm trying to understand DNS resolution and retries. The manpage for resolv.conf tells me:

Code:
nameserver Name server IP address
... Up to MAXNS (currently 3,  see  <resolv.h>)  name servers may be
listed ...

timeout:n
sets the amount of time the resolver will wait for a response from a 
remote name server before retrying the query via a different name 
server.  Measured in seconds, the default is RES_TIMEOUT (currently 5, 
see <resolv.h>).

attempts:n
sets the number of times the resolver will send a query to its name 
servers before giving up and  returning  an  error  to  the 
calling application.  The default is RES_DFLRETRY (currently 2, 
see <resolv.h>).
Assuming the defaults are set, is the worst case scenario a wait time of 30 seconds before success / failure - (RES_TIMEOUT * MAXNS * RES_DFLRETRY)?
Or is there an exponential backoff ?

Also, which happens first, timeout or attempts?? That is, does DNS make two attempts to resolve via server 1, and then two attempts via server 2? Or does it make one attempt at server 1, then server 2, then back to the top of the list?

Thanks very much!
 
Old 05-08-2009, 12:45 PM   #2
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,814
Blog Entries: 15

Rep: Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661
attempts first then timeout. The first server listed will be attempted for the timeout then reattempted for the timeout up to number of attempts.

"Worst case" would be timeout after the 3rd one was attempted the second time.

However, some applications might not like the initial timeout and fail before they ever get to the second one. That's why these parameters are adjustable - 5 seconds isn't much in human time but in CPU cycles it can be forever.
 
Old 05-08-2009, 01:24 PM   #3
Earthworm Jim
LQ Newbie
 
Registered: May 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Does the "rotate" option affect that behavior, or does it direct each individual lookup to the next server in the list?

If I can pick your brain for a minute, what other factors can contribute to lag / delay? ie - if I've done a sethostent(1) and my first server is not available, am I going to have to wait ((TCP Timeout + 5) * 2) seconds ( ~370 seconds, IIRC) before it attempts the next server?
 
Old 05-08-2009, 01:39 PM   #4
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,814
Blog Entries: 15

Rep: Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661
rotate modifies the behavior by making not all lookups try the first server but rather try them in a round robin fashion (i.e. first lookup by anyone goes to first server, second lookup goes to second server, third lookup goes to third server, fourth lookup goes to first again etc...).

I've not used rotate but my assumption is for the same query it would go to another server if it couldn't find the one it first attempted.

That is to say the intent of rotate is not to shorten how many servers are attempted or timeouts on each but rather to spread the load more evenly among all 3 servers. If you have 1000 machines all doing lookups on primary server while you have 2 other servers waiting for requests it might heavily load that first system and cause response delays forcing the timeout that fails to the second. By doing the round robin you avoid that by insuring approximately 1/3 of requests from each machine goes to first server, 1/3 to second server and final 1/3 to third server. Note that this round robin is internal to the machine that has the resolv.conf. The first query of all 1000 machines would go to the first machine as none of them would be aware that other machines were doing that lookup. Not really a problem since it is very unlikely you'd have all 1000 machines hitting the same server at the same time and I mention it only for concept.

Anyway to that end the rotation would help with performance if your nameservers were all up and heavily loaded because it spreads the load. The main reason most people have 2-3 nameservers isn't load but rather the fact that one of the nameservers might be down for some reason.

You don't have to wait the 2 x 5 to failover - both of those are configuratble in the resolv.conf. You can set it to 1 x 1.
 
Old 05-14-2009, 11:54 AM   #5
Earthworm Jim
LQ Newbie
 
Registered: May 2009
Posts: 3

Original Poster
Rep: Reputation: 0
So, a few more questions on this:

1. The various gethost* traffic is always UDP unless the client has explicitly done a sethostent(1), correct? Does the resolver ever take it upon itself to switch from UDP to TCP?

2. Assume, for whatever reason, I have 2 DNS servers configured and the first is simply 'slow'. If my request to server1 times out and I've moved on to server 2, but then a response from server 1 comes across, what happens?

thanks much!!
 
Old 05-14-2009, 12:00 PM   #6
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,814
Blog Entries: 15

Rep: Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661Reputation: 1661
1) Truthfully I'm not sure on this. On the BIND mailing list I often see talk about tcp/udp. In a recent thread there was a post that suggested tcp is really only needed (if ever) for DNSSEC and isn't required by RFC. Most traffic will go through udp which is required. To me there doesn't seem to be much point in blocking tcp on port 53 if there's a chance it will be needed.

2) Once it "times out" it has abandoned the attempt and moved on. The connection won't be there for it to send any data back and even if it does the process is no longer looking for it so would ignore it.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Local network DNS resolution fails, but DNS resolution to internet is OK farge Linux - Networking 6 05-28-2008 11:49 PM
Win2k3 DNS + PFsense DNS Forwarder = No internal DNS resolution Panopticon Linux - Networking 1 11-19-2007 09:59 PM
scsi2 channel 0 : resetting for second half of retries procfs Linux - Hardware 2 07-06-2007 05:47 AM
DNS Server: WinXP DNS clients losing name resolution frequently loopy69 Linux - Server 4 03-27-2007 09:21 PM
Tx excessive retries & Collisions iasion Linux - Networking 1 04-28-2004 07:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 09:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration