I administer a 12-computer network, running Fedora Core 3 with
Code:
[root@uma ~]# uname -a
Linux uma 2.6.9-1.667 #1 Tue Nov 2 14:41:25 EST 2004 i686 i686 i386 GNU/Linux
I perform regular updates from the freshrpms apt repository (in fact, I use a UK mirror). Recently, one of the computers on our network refused to connect to the WAN after a restart. I later replicated the problem on another computer which *was* connected to the WAN, simply by reloading the network configuration with "/etc/init.d/network reload" - after which there was no connection to the WAN. I can then ping 127.0.0.1, and the IP address of the local machine, and other machines on the LAN, but not further - not even the gateway on our network segment.
I believe the problem is due to the script dhclient-script, and I have been unable to find a solution to this problem on the web. Please forgive me if this problem has already been resolved.
I believe that dhclient-script is failing to set the default gateway, since on the affected machines we have
Code:
[root@uma ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.10.41.0 * 255.255.255.0 U 0 0 0 eth0
169.254.0.0 * 255.255.0.0 U 0 0 0 eth0
despite the fact that eth0 is configured by DHCP:
Code:
[root@uma ~]# grep -v ^# /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=00:A0:C9:E6:C3:A8
ONBOOT=yes
TYPE=Ethernet
and the lease received from the server includes the option "routers"
Code:
[root@uma ~]# tail -n 15 /var/lib/dhcp/dhclient-eth0.leases
lease {
interface "eth0";
fixed-address 10.10.41.8;
option subnet-mask 255.255.255.0;
option routers 10.10.41.254;
option dhcp-lease-time 1800;
option dhcp-message-type 5;
option domain-name-servers 10.10.1.253;
option dhcp-server-identifier 10.1.1.1;
option broadcast-address 10.10.41.255;
option domain-name "physics.nat";
renew 3 2005/3/9 15:17:30;
rebind 3 2005/3/9 15:29:56;
expire 3 2005/3/9 15:33:41;
}
I was further lead to believe that the fault is in dhclient when I looked at the timestamps on the dhclient scripts:
Code:
[root@uma ~]# ls -l /sbin/dhclient*
-rwxr-xr-x 1 root root 356004 Feb 25 02:05 /sbin/dhclient
-rwxr-xr-x 1 root root 12384 Feb 25 02:05 /sbin/dhclient-script
So the change happened some time after Feb 25 - and I certainly didn't do it, since I don't tamper with system scripts. I then looked through my upgrade logs, and found the following updates since 24/2/2005:
24/2/2005 - postgresql-libs
27/2/2005 - at bind bind-libs bind-utils dhclient firefox gaim pvm tcsh vixie-cron
4/3/2005 - gamin gamin-devel
5/3/2005 - tzdata
6/3/2005 - firefox libtool libtool-libs
9/3/2005 - gaim
Circumstantial evidence is building against dhclient, since we now know it had "opportunity" and "means". I then copied dhclient-script from a machine running Fedora Core 1 to uma, with timestamps from *before* 25/2/2005:
Code:
[robert@kingsley: robert] ls -l /sbin/dhclient*
-rwxr-xr-x 1 root root 350000 Oct 8 2003 /sbin/dhclient
-rwxr-xr-x 1 root root 8181 Oct 8 2003 /sbin/dhclient-script
and everything worked fine! The default gw was included in the routing table, and the WAN was visible!
I have cursorily glanced at what has changed in dhclient-script, but do not yet have a solution (what we might fancifully call the "motive", building on the Columbo metaphor from earlier). I suspect this behaviour may even be intentional, to comply with behaviour of other systems. But, it does not seem particularly helpful behaviour to me. And I believe other people have recently been having the same problem:
thread 293615
thread 298569
thread 296513
thread 299378
Any thoughts? Is anyone having the same problem? It looks suspiciously like a bug to me.
Thanks,
Robert.