[SOLVED] 3rd try.... windows 7 clients randomly failing to resolve address of nameserver
Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
3rd try.... windows 7 clients randomly failing to resolve address of nameserver
i've posted about this twice and got zero responses, so.. third times the charm right?
I have a BIND server setup for my home network. It's named "pLAN9-Server1", address 172.16.16.2, local domain name of "pLAN9.site". This machine is also a dhcpd server. After what seems like a random amount of time, and with no apparent cause, Windows 7 clients stop being able to resolve the address of the nameserver itself. Any other address gets resolved fine, it's just the nameserver itself. Even weirder, doing an "nslookup" from a Win7 box that's having the problem always returns the proper result??? But pinging the server by name or trying to access it via Windows explorer fails.... that doesn't make ANY sense to me... if nslookup resolves the name, why is every other means of accessing the server failing? Of course, the Windows boxes don't log anything remotely helpful. Doing an "ipconfig /renew" on the windows machines always fixes the problem temporarily, but it eventually comes back, once again with no comprehensible reason. It seems like it's a DNS-only problem, as accessing the server by IP always works.
Here's output on the windows command line from a machine that currently has no access (nslookup -d2):
Code:
> plan9-server1
Server: pLAN9-Server1.pLAN9.site
Address: 172.16.16.2
------------
SendRequest(), len 42
HEADER:
opcode = QUERY, id = 2, rcode = NOERROR
header flags: query, want recursion
questions = 1, answers = 0, authority records = 0, additional = 0
QUESTIONS:
plan9-server1.pLAN9.site, type = A, class = IN
------------
------------
Got answer (72 bytes):
HEADER:
opcode = QUERY, id = 2, rcode = NOERROR
header flags: response, auth. answer, want recursion, recursion avail.
questions = 1, answers = 1, authority records = 1, additional = 0
QUESTIONS:
plan9-server1.pLAN9.site, type = A, class = IN
ANSWERS:
-> plan9-server1.pLAN9.site
type = A, class = IN, dlen = 4
internet address = 172.16.16.2
ttl = 259200 (3 days)
AUTHORITY RECORDS:
-> pLAN9.site
type = NS, class = IN, dlen = 2
nameserver = plan9-server1.pLAN9.site
ttl = 259200 (3 days)
------------
------------
SendRequest(), len 42
HEADER:
opcode = QUERY, id = 3, rcode = NOERROR
header flags: query, want recursion
questions = 1, answers = 0, authority records = 0, additional = 0
QUESTIONS:
plan9-server1.pLAN9.site, type = AAAA, class = IN
------------
------------
Got answer (92 bytes):
HEADER:
opcode = QUERY, id = 3, rcode = NOERROR
header flags: response, auth. answer, want recursion, recursion avail.
questions = 1, answers = 0, authority records = 1, additional = 0
QUESTIONS:
plan9-server1.pLAN9.site, type = AAAA, class = IN
AUTHORITY RECORDS:
-> pLAN9.site
type = SOA, class = IN, dlen = 38
ttl = 86400 (1 day)
primary name server = plan9-server1.pLAN9.site
responsible mail addr = admin.plan9.co
serial = 2013062701
refresh = 28800 (8 hours)
retry = 7200 (2 hours)
expire = 2419200 (28 days)
default TTL = 86400 (1 day)
------------
Name: plan9-server1.pLAN9.site
Address: 172.16.16.2
> exit
C:\Users\Wil>ping plan9-server1
Ping request could not find host plan9-server1. Please check the name and try again.
C:\Users\Wil>ping plan9-server1.plan9.site
Ping request could not find host plan9-server1.plan9.site. Please check the name and try again.
Why the frick can't Windows ping it by name, even when I try the FQDN?
server config files
/etc/named.conf:
Code:
//
// /etc/named.conf
//
options {
directory "/var/named";
pid-file "/var/run/named/named.pid";
auth-nxdomain yes;
datasize default;
// Uncomment these to enable IPv6 connections support
// IPv4 will still work:
// listen-on-v6 { any; };
// Add this for no IPv4:
// listen-on { none; };
// Default security settings.
allow-recursion { localnets; };
allow-transfer { localhost; };
allow-update { localhost; };
allow-query { localnets; };
forwarders { 208.67.222.222; 208.67.222.222; };
listen-on { 127.0.0.1; 172.16.16.2; };
max-cache-ttl 1209600;
# edns-udp-size 512;
edns-udp-size 4096 ;
version none;
hostname none;
server-id none;
};
zone "localhost" IN {
type master;
file "localhost.zone";
allow-transfer { localhost; };
};
zone "0.0.127.in-addr.arpa" IN {
type master;
file "127.0.0.zone";
allow-transfer { localhost; };
};
zone "." IN {
type hint;
file "root.hint";
};
zone "pLAN9.site" IN {
type master;
notify no;
file "pLAN9.zone";
};
zone "16.16.172.in-addr.arpa" IN {
type master;
notify no;
file "pLAN9.rev";
};
logging {
channel xfer-log {
file "/var/log/named.log";
print-category yes;
print-severity yes;
print-time yes;
severity debug 10;
};
category xfer-in { xfer-log; };
category xfer-out { xfer-log; };
category notify { xfer-log; };
};
$TTL 3D
@ IN SOA pLAN9-Server1.pLAN9.site. admin.plan9.co. (
2013062701
8H
2H
4W
1D )
NS pLAN9-Server1
localhost A 127.0.0.1
pLAN9-Server1 A 172.16.16.2
pLAN9-Gateway A 172.16.16.1
pLAN9-Server2 A 172.16.16.3
pLAN9-Wil A 172.16.16.10
pLAN9-HTPC A 172.16.16.11
pLAN9-Laptop A 172.16.16.12
Galt-PC A 172.16.16.15
Switch A 172.16.16.19
pLAN9-WAP A 172.16.16.254
# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host
# localhost name resolution is handled within DNS itself.
# 127.0.0.1 localhost
# ::1 localhost
i haven't touched any of themm. (btw, gotta love windows keeping the hosts file in "drivers(?)/etc/")
Actually no, if youre referring to the "option wpad" thing in dhcpd.conf. I put that there because all of the Win7 clients were semi-flooding the dhcp server with a bunch of spurious "DHCPINFORM" messages, and adding those lines stopped that.
just adding that when the systems lose access, running a rndc querylog on the DNS server produces absolutely no results when the clients try to ping/access by name. Nothing at all. So it seems like the Windows clients aren't even trying to send a DNS query.... but only for the nameserver's query... this makes no sense.
i have since removed the second DNS server from dhcpd.conf, the opendns address that I had had as the secondary dns. it's been almost 3 days without a problem. if it doesn't happen for a few more days ill mark this solved.
i have since removed the second DNS server from dhcpd.conf, the opendns address that I had had as the secondary dns. it's been almost 3 days without a problem. if it doesn't happen for a few more days ill mark this solved.
You've found the culprit.
The reason is the way Windows handles multiple DNS entries. It works like this:
The primary entry is always used. None of the other entries are ever tried as long as the primary DNS server responds, even if the response is NXDOMAIN.
Should the primary server fail to respond within the timeout period (2 sec), Windows switches to the next server in the list. Permanently. It does not ever switch back unless the secondary server fails to respond.
If the secondary server fails to respond, Windows switches to the tertiary server (or back to the primary server if no tertiary server exists)
Perhaps the server running BIND gets overloaded at times, or the DNS packets are dropped because of temporary bandwidth issues in your LAN. Or there could be noise disrupting your wireless network, if that's what the Windows PC is using.
Such events would trigger a switch of DNS server on the Windows client, and from then on it would be using the OpenDNS server exclusively. Of course, the OpenDNS server knows nothing of the contents of your internal DNS zone, and will return NXDOMAIN for any queries related to the "pLAN9.site" domain.
The only way to make Windows switch back to the primary server is to deactivate and reactivate the network connection, reset the IP configuration (as you've discovered), or reboot the OS. Oh, and sometimes Windows may decide that the primary DNS server is not the one you've put first in your DHCP zone. In other words, never mix internal and external DNS servers.
actually, it isn't dropped queries or packets or anything; i typically reboot this and all of my servers fairly frequently as i update them nearly constantly. during the reboot, if any client tried to query DNS for any reason, it would fail, and from then on would never try my internal server again.
am i the only one who doesn't think this is fairly illogical behavior on the part of Windows? i would think that, at least like every 50 queries or so it would try the first one again to see if it's back up? since after all, Windows itself calls the first DNS server "primary"...
in any case, i guess it wasn't exactly a "Linux" question, but thanks for all the help folks.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.