LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Fedora
User Name
Password
Fedora This forum is for the discussion of the Fedora Project.

Notices

Reply
 
Search this Thread
Old 07-02-2006, 04:32 PM   #1
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Rep: Reputation: 47
Question FC5, 2.6.16 kernel, drops ethernet after 10+ minutes


Fedora Core 5 network problems

Problem:
Network interface fails after about 5-10 minutes. ifdown/ifup does not bring the
interface back up.

Hardware: E-machines T6528
Processor: Athlon 64 2.2 Ghz
Motherboard: MSI MS-7207
Built in Ethernet: nVidia Corporation MCP51 Ethernet Controller, using forcedeth driver.
PCI Ethernet Card: Linksys NC100 Network Everywhere Fast Ethernet 10/100, using tulip driver

OS: Fedora Core 5 x86

History:
After the initial install, I did not have ethernet through the built in ethernet. I cannibalized the Linksys card from an old dead Linux box. This caused an IRQ conflict. I re-installed the OS (Switched away from the 64 bit version in the process, because I was having other issues), this time only installed the Linksys (the other interface still shows up in the gnome GUI network manager, but it's disabled). The Linksys card is eth0. It would stay up for two to three minutes at a time, then would fail.

Some googling yielded a link which I can't post yet, which suggested that this was a bug in the stock 2.6.15 kernel, affecting the tulip driver. I upgraded to 2.6.16-1.2122_FC5. At first, I thought that I had entirely fixed the problem; mean time between failure went from around 2 minutes to over 10 minutes.

Here's what happens during a constant ping, when the network goes down:

....
64 bytes from 192.168.1.1: icmp_seq=235 ttl=64 time= 0.739 ms
ping: sendmsg: No buffer space available
....

Here's what happens when I stop and start the network while the interface is up:

[root@baz ~]# service network stop
Shutting down interface eth0: [ OK ]
Shutting down loopback interface: [ OK ]
[root@baz ~]# service network start
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: [ OK ]

Here are the specifics:

/etc/modprobe.conf:

alias eth1 forcedeth
alias scsi_hostadapter sata_nv
alias snd-card-0 snd-hda-intel
options snd-card-0 index=0
options snd-hda-intel index=0
remove snd-hda-intel { /usr/sbin/alsactl store 0 >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-hda-intel
alias eth0 tulip

here's the output from

grep eth0 /var/log/messages

Jul 2 04:05:20 baz kernel: NETDEV WATCHDOG: eth0: transmit timed out

... snip many lines of the same ...

Jul 2 08:34:45 baz kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul 2 08:47:22 baz avahi-daemon[1932]: Leaving mDNS multicast group on interfac e eth0.IPv4 with address 192.168.1.98 .
Jul 2 08:47:25 baz kernel: NETDEV WATCHDOG: eth0: transmit timed out
{ reboot here }
Jul 2 14:34:05 baz avahi-daemon[1979]: New relevant interface eth0.IPv4 for mDN S.
Jul 2 14:34:05 baz avahi-daemon[1979]: Joining mDNS multicast group on interfac e eth0.IPv4 with address 192.168.1.98.
Jul 2 14:34:05 baz avahi-daemon[1979]: Registering new address record for 192.1 68.1.98 on eth0.
Jul 2 14:34:07 baz kernel: eth0: ADMtek Comet rev 17 at d88a0c00, 00:04:5A:6E:6 6:37, IRQ 5.
Jul 2 14:34:08 baz kernel: eth0: Setting full-duplex based on MII#1 link partne r capability of 45e1.

Here's the configuration for eth0:

[tiger@baz modprobe.d]$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Linksys NC100 Network Everywhere Fast Ethernet 10/100
DEVICE=eth0
BOOTPROTO=none
HWADDR=00:04:5a:6e:66:37
ONBOOT=yes
DHCP_HOSTNAME=baz.localdomain
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes
IPADDR=192.168.1.98
NETMASK=255.255.255.0
GATEWAY=192.168.1.1

Here's the output from ifconfig while the network is up. I don't think that it looks any different after the network goes down:

[tiger@baz ~]$ /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:04:5A:6E:66:37
inet addr:192.168.1.98 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::204:5aff:fe6e:6637/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:985 errors:0 dropped:0 overruns:0 frame:0
TX packets:758 errors:0 dropped:0 overruns:0carrier:0
collisions:0 txqueuelen:1000
RX bytes:358771 (350.3 KiB) TX bytes:234303 (228.8 KiB)
Interrupt:5 Base address:0xc00

...

I'm a little concerned that eth0 has inet6 addr: fe80::204:5aff:fe6e:6637/64,
even though the configuration file shows IPV6INIT=no.

===

The fact that eth0 is failing is the main issue. There are a couple of other problems which may or may not be related:

1) dhclient writes 192.168.1.100 to as the primary DNS server in /etc/resolv.conf. Because this server does not exist, all DNS lookups are slow.I actually wanted a static IP address on this box anyway, so I disabled DHCP and edited /etc/resolv.conf by hand. Nonetheless, this seems broken.

2) After eth0 fubars, when I reboot the box, the kernel hangs while trying to turn off iptables.

===

I'm inclined to think that this is a kernel or driver issue, but I don't really know which it is likely to be, or what to install next.

Oh... I forgot to mention... I rebooted into Windows XP, and ran a constant ping for 2 hours with no packet loss, therefore I don't think it's a network card issue.

--Barton
 
Old 07-02-2006, 06:35 PM   #2
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
Ok... more info.

I had eth0 up for most of the afternoon. I was SSH'd into the FC5 box and I decided to do another constant ping. Here's what I grabbed from PuTTY after the network went down:

64 bytes from 192.168.1.1: icmp_seq=3424 ttl=64 time=0.715 ms
64 bytes from 192.168.1.1: icmp_seq=3425 ttl=64 time=0.724 ms
64 bytes from 192.168.1.1: icmp_seq=3426 ttl=64 time=0.720 ms
64 bytes from 192.168.1.1: icmp_seq=3427 ttl=64 time=0.724 ms
64 bytes from 192.168.1.1: icmp_seq=3428 ttl=64 time=0.724 ms
64 bytes from 192.168.1.1: icmp_seq=3429 ttl=64 time=0.709 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available

this means that the network was up and sending messages after ping stopped. I did have a feeling that ping might have triggered the problem...
 
Old 07-04-2006, 12:28 PM   #3
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
I rebooted last night, got the network running again, and left the network up and running without a constant ping. eth0 stayed up for about 2 hours, then failed again. Once again, /var/log/messages shows

baz kernel: NETDEV WATCHDOG: eth0: transmit timed out

due to the error message I was getting with ping earleir
(ping: sendmsg: No buffer space available), I'm guessing that something is keeping sendmsg's buffer from clearing, and it seems that this *not* simply
the network failing, because I'm seeing those messages across the network.

I'm assuming that sendmsg() is being called by ping, and several other programs, which is why the network continues to go down even though I'm not pinging anything. I'm also assuming that sshd is *not* using sendmsg(), which is why I continue to see messages across the network even after the sendmsg buffer is full.

Now... what that actually means for my network, or how to fix it, I have no idea.
 
Old 07-04-2006, 03:50 PM   #4
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
dmesg | grep Tulip
Linux Tulip driver version 1.1.13 (May 11, 2002)
 
Old 07-04-2006, 10:56 PM   #5
steve-alexander
LQ Newbie
 
Registered: Mar 2005
Location: Ohio
Distribution: FC6->F7
Posts: 23

Rep: Reputation: 16
The tulip is a well supported chip if that's your eth0 - that's not the issue. The IPv6 address ok OK, no worries. You aren't giving us much to work with.

Are you running NetworkManager and NetworkManagerDispatcher ? These are relatively new services and seem to be a little 'twitchy'. They can take down your connection. Disable them (System->Administration->ServerSettings->Services) unclick the services, save & reboot.

You also have an eth1 alias. Is it up ? does it stay up ?

There is some sort of unresolved glitch st after a suspend/resume the ethernet handles (eth1, eth0) may be scrambled. Check that the HWaddr for eth0 remains constant before/after the failure (use ifconfig -a).

The only other thought is that you should boot to single user mode and test the problem. Use the grub interface to add the word "single" to the kernel command line. (I think you type any character at the grub splash-screen, then type an 'e' for edit, then add the characters " single" and hit return). You'll get a tty console as root (no X11). Then try the periodic ping with a command like
# while true; do ping 192.168.1.1 ; sleep 30; done

You may have to manually bring up the interface with 'ifup eth0'. DON'T start the network services - too many possible implications.
 
Old 07-05-2006, 06:54 AM   #6
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
Quote:
Originally Posted by steve-alexander
The tulip is a well supported chip if that's your eth0 - that's not the issue. The IPv6 address ok OK, no worries. You aren't giving us much to work with.

Are you running NetworkManager and NetworkManagerDispatcher ? These are relatively new services and seem to be a little 'twitchy'. They can take down your connection. Disable them (System->Administration->ServerSettings->Services) unclick the services, save & reboot.

You also have an eth1 alias. Is it up ? does it stay up ?
I looked at the services, NetworkManager and NetworkManagerDispacher are both disabled.

eth1 has been disabled in the bios, and is not enabled on boot.

Quote:
Originally Posted by steve-alexander
There is some sort of unresolved glitch st after a suspend/resume the ethernet handles (eth1, eth0) may be scrambled. Check that the HWaddr for eth0 remains constant before/after the failure (use ifconfig -a).
I saw this change *once* a few days ago. I've kept an eye on it ever since, and in all the times that the network has failed since then, the HWaddr has stayed constant.

I tried 'init 1', I tried to enter single user mode through grub... I finally had to edit /etc/inittab to get into single user mode. I'm currently running a constant ping to my router. I let that run for a couple of hours; this should be enough to re-create the problem if it exists in single user mode.
 
Old 07-05-2006, 10:21 AM   #7
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
Under single user mode, I brought up eth0 using

ifup eth0

I ran a constant ping for 7200 seconds, with no packet loss.

I did an 'ifdown eth0', then as a control, brought the network up using

service network start

I've run a constant ping for over an hour, and the network is currently still up.

I'm considering booting back into multi-user mode and running something like

time (while ping -c 1 192.168.1.1; do sleep 20; done)

to give me some sense of mean-time between failure, because I don't know just how much confidence I have that the trouble will occur in 2 hours... my gut feeling is that two hours should be enough, but I would hate to rule something out as a problem, only to find out that if I had run the ping for 15 minutes longer, it would have failed...

My next step is to boot back into multi-user mode, test the time between failures, and do some research on how runlevel 5 brings up the network, and how that differs from 'service network start' when run in single user mode.
 
Old 07-05-2006, 12:46 PM   #8
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
The first time I ran the timing command, it ran for 7 minutes and 50 seconds.
The second time, I ran a flood ping from another Linux box to this one... the network went down. I rebooted and did this agian, and the same thing happened. I had done a ping flood to the box earlier while I had it in single user mode
there was 0% packet loss at that time, and eth0 stayed up.
 
Old 07-08-2006, 02:33 PM   #9
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
Well... I've decided to throw in the towel, wipe the hard drive and install FC4.
 
Old 07-09-2006, 01:30 AM   #10
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
FC4 is up and running, eth0 is running smoothly.
 
Old 07-19-2006, 08:33 PM   #11
kaz2100
Senior Member
 
Registered: Apr 2005
Location: Penguin land, with apple, no gates
Distribution: Debian testing woody(32) sarge etch lenny squeeze(+64) wheezy jessie
Posts: 1,445

Rep: Reputation: 83
Hi,

I also had similar problem (dead network after a while) with my Penguin.
Debian etch
kernel 2.6.17.4
Toshiba satellite A100 ST2311 -> please refer HCL, I submitted.

I just turned off watchdog at ketnel config so far my penguin is healthy.
 
Old 07-20-2006, 12:00 AM   #12
MoonlitSky
Member
 
Registered: Jan 2006
Distribution: Fedora 4, 5
Posts: 30

Rep: Reputation: 15
Just wondering, do you SELinux enabled? Or any other kind of firewall? FC5 is still a
bit ragged around the edges. A hang or glitch there could concievably cause problems like yours.
 
Old 07-20-2006, 04:27 PM   #13
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
Well I'll be... I figured the watchdog was just reporting the error... never ocurred to me that it might be *causing* the error.

Which kind of begs the question... what does watchdog do, anyway?

Moonlit: SELinux was enabled, but it was in logging mode only.
 
Old 07-21-2006, 07:56 PM   #14
MoonlitSky
Member
 
Registered: Jan 2006
Distribution: Fedora 4, 5
Posts: 30

Rep: Reputation: 15
If SE Linux was only in logging mode, then I doubt it's causeing the problem. I have a simular problem though. Or had, rather. I have 2 comps, one with Win2k installed, and one with FC5. I have them conected to a router, which is in turn connected to my cable modem. I had a problem with my internet connection dropping out on me on one or both systems. I finally reprogrammed the router to auto discover it's connection settings. So far, no more problem.
 
Old 07-24-2006, 03:23 PM   #15
kaz2100
Senior Member
 
Registered: Apr 2005
Location: Penguin land, with apple, no gates
Distribution: Debian testing woody(32) sarge etch lenny squeeze(+64) wheezy jessie
Posts: 1,445

Rep: Reputation: 83
I have to say, "watchdog is only reporting" seems to be correct. But, my penguin is totally healthy ever since watchdog is truned off. When it was on, I had dead network every once in short while.
My guess is, something like IRQ conflict or ACPI, APM related????
 
  


Reply

Tags
fc5, networking


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
kernel drops packets of size less than 48 bytes neelay1 Linux - Networking 1 07-16-2006 02:56 AM
Having a problem with internet and ethernet in FC5 MRMadhav Fedora 5 05-16-2006 08:08 AM
FC5 IOMEGA Ethernet Drive P-G Fedora 1 04-08-2006 08:30 AM
Help! Ethernet disconnect after 2 minutes Bakoulou Mandriva 1 02-18-2006 04:21 PM
Wireless connection drops after few minutes jeffpr Linux - Wireless Networking 2 07-10-2005 08:30 AM


All times are GMT -5. The time now is 06:19 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration