LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   NIC Stops Working With Current Huge Kernel (https://www.linuxquestions.org/questions/slackware-14/nic-stops-working-with-current-huge-kernel-727860/)

Woodsman 05-22-2009 02:55 PM

NIC Stops Working With Current Huge Kernel
 
Recently I started playing with Current (13.0). Seems that every time I boot with the 2.6.29.2 huge kernel, I lose my NIC. The LEDs stop working too. I have to power down the box to reinitialize the NIC.

The NIC is on-board with a Asus M2NPV-VM motherboard. The on-board 1 Gbps NIC is part of the Nvidia MCP51 chip set (Nvidia nForce 430 MCP51 controller, Marvell PHY).

Any ideas what might be causing this?

Woodsman 05-23-2009 10:52 PM

Okay, a slightly modified generic kernel (enabled ext2/3/4 support direct into the kernel and disabled the boot logo) causes this too.

This is weird. I'm not paranoid --- yet.

Any ideas?

forum1793 05-24-2009 12:14 AM

There have been a few times in the last month or two that I've noticed the NIC being disconnected. Kind of surprising but I just restarted it via rc.inet1. I almost wondered if the NIC was sleeping but then thought it was related to one of the upgrades. Maybe wicd or ntp. I don't use wicd on this machine but notice on "top" that it pops up every once in a while. Suppose it could also be related to mysql and mythtv.

vdemuth 05-24-2009 12:38 AM

What happens if you just unload and reload the NIC module. Only ask as I had similar problems, albeit with the wireless interface using the same kernel so decided to go back to the kernel provided by 12.2, though still running everything else current.

Woodsman 05-24-2009 09:23 PM

I again fiddled with this today. When I use the 2.6.27.7 kernel with Current I have no problems. When I use the 2.6.29.2 kernel from Current then the NIC stops working.

I can watch the NIC LEDs and as soon as the udev script starts running the LEDs extinguish. They do not do that with 12.2 when the udev script runs.

Removing the NIC kernel module (rmmod forcedeth) and reloading (modprobe forcedeth) does nothing.

I compiled a new kernel using the generic config as a basis. I built in ext3 support, removed the logos, added 64GB support.

With the newly compiled kernel the NIC LEDs again extinguished when the udev script ran, but reappeared when the inet1 script ran.

Everything seemed okay.

I then noticed that something happened during shutdown/rebooting that disabled the NIC. During shutdown/reboot I noticed when the inet1 script is run (stopped), the NIC LEDs extinguished and upon reboot, the NIC failed to initialize under Current. Therefore something in the way the NIC is started/stopped is triggering this behavior. That would seem to imply the ifconfig command.

Stranger, I have to shutdown my Linksys WRT54GL router too. Seems that whatever causes the NIC to hang latches the router port too. :scratch:

I'll keep testing, but this is weird. :scratch: And unsettling. :(

mRgOBLIN 05-24-2009 10:05 PM

Like I said earlier I'm sure It's a kernel bug.

http://www.nvnews.net/vbulletin/showthread.php?t=130438


I wonder if wicd would be a work-around for this.

Woodsman 05-24-2009 11:16 PM

Quote:

Like I said earlier I'm sure It's a kernel bug.
Ah, sorry, I must have missed that report somewhere. :) Thanks for the link and info. I won't bother troubleshooting further. Looks like a patch was submitted. I hope the kernel gets updated officially before the next official Slackware release.

As one person in the linked thread stated:

Quote:

I have not found a reliable way to re-enable the NIC once the PHY is shut down.
Pretty much sums up what I reported.

disturbed1 05-24-2009 11:17 PM

Quote:

Originally Posted by Woodsman (Post 3551539)
I again fiddled with this today. When I use the 2.6.27.7 kernel with Current I have no problems. When I use the 2.6.29.2 kernel from Current then the NIC stops working.

I can watch the NIC LEDs and as soon as the udev script starts running the LEDs extinguish. They do not do that with 12.2 when the udev script runs.

Removing the NIC kernel module (rmmod forcedeth) and reloading (modprobe forcedeth) does nothing.:(

Kernel bug. Fixed in 2.6.29.3

Quote:

commit 217b4400b6d789dbbd55d854d5f4db9d3a5817d1
Author: Ed Swierk <eswierk@aristanetworks.com>
Date: Mon Apr 6 17:49:12 2009 -0700

forcedeth: Fix resume from hibernation regression.

upstream commit: 35a7433c789ba6df6d96b70fa745ae9e6cac0038

Reset phy state on resume, fixing a regression caused by powering down
the phy on hibernate.

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

Woodsman 05-25-2009 04:21 PM

Quote:

Kernel bug. Fixed in 2.6.29.3
Hmm. I can't confirm. I downloaded the 2.6.29.3 sources and compiled the kernel using my 2.6.29.2 config file. The NIC functions fine until I warm reboot (Ctrl-Alt-Del). Then the same thing happens when the inet1 service script is run (stopped). The NIC LEDs extinguish and the NIC fails to initialize thereafter. :(

I ran out of time to test the generic or huge config files or 2.6.29.4.

disturbed1 05-25-2009 09:22 PM

I didn't see anything in the .4 changelog for forcedeth. I have an older nic that uses this module, fixed it for that, perhaps a different issue directly effecting your model. Though that fix was tested and on (a DFI board with an nVidia MCP55) and signed off.

Reading through LKML, it goes from blaming the patch (too aggressive), user error, bad bios tables ...........

2.6.27.23 has this http://kernel.org/pub/linux/kernel/v...eLog-2.6.27.23
Quote:

commit 1271c912ea7d12fe6bd3034ba5b0c03f828d69c9
Author: Ed Swierk <eswierk@aristanetworks.com>
Date: Mon Apr 6 17:49:12 2009 -0700

forcedeth: Fix resume from hibernation regression.

upstream commit: 35a7433c789ba6df6d96b70fa745ae9e6cac0038

Reset phy state on resume, fixing a regression caused by powering down
the phy on hibernate.

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Then there's this
Quote:

commit 34edaa88324004baf4884fb0388f86059d9c4878
Author: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Date: Mon Feb 16 00:13:20 2009 -0800

net: forcedeth: Fix wake-on-lan regression

Commit f55c21fd9a92a444e55ad1ca4e4732d56661bf2e ("forcedeth: call
restore mac addr in nv_shutdown path"), which was introduced to fix
the regression tracked at
http://bugzilla.kernel.org/show_bug.cgi?id=11358 causes the
wake-on-lan mac to be reversed in the shutdown path. Apparently the
forcedeth situation is rather messy in that the mac we need to
writeback for a subsequent modprobe to work is exactly the reverse of
what is needed for proper wake-on-lan.

The following patch explains the situation in the comments and
makes the call to nv_restore_mac_addr() conditional (only called if
we are not really going for poweroff).

Tobias Diedrich wrote:
> Hmm, I had not tried WOL for some time.
> With 2.6.29-rc3 is see the following behaviour:
>
> State WOL Behaviour
> ------------------------------
> shutdown reversed MAC
> disk/shutdown reversed MAC
> disk/platform OK
>
> Apparently nv_restore_mac_addr() restores the MAC in the wrong order
> for WOL (at least for my PCI_DEVICE_ID_NVIDIA_NVENET_15). platform
> works, because the MAC is not touched in the nv_suspend() path.
>
> A possible fix might be to only call nv_restore_mac_addr() if
> system_state != SYSTEM_POWER_OFF.

With the following patch:
shutdown OK
disk/shutdown OK
disk/platform OK
kexec OK

Signed-off-by: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Tested-by: Philipp Matthias Hahn <pmhahn@titan.lahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Might have to stick with the 12.2 kernel, or get another NIC. I went through a similar ordeal with one of my onboard reltek NICs. It's finally fixed in the kernel now, but my $8 dedicated nic works much better.

Daedra 05-26-2009 12:32 AM

I remember I used to have the most trouble with the "forcedeth" driver for my NIC's back in the day, I eventually threw in a old 3com I had to fix the problem, not the way I like to fix things but it worked ;)

Woodsman 05-29-2009 10:53 AM

Well I wish I knew what to do. Without hard numbers, the 2.6.29.x kernel seems faster than 2.6.27.7 and the KDE 3.5.10 desktop seems a tad snappier too. Could be my imagination. Regardless, I'd like to find a remedy for this regression. :(

Woodsman 05-29-2009 05:17 PM

I just compiled a 2.6.29.4 kernel and tested Current. Same results. The NIC locks upon a reboot, which is unacceptable. Sure, under typical usage I don't reboot all day, but when I'm testing, such as now with testing Current, I reboot often. Annoying. :(

I'm going to try to copy the 2.6.27.7 forcedeth source files to 2.9.29.2 and compile again. I have no idea whether that will work.

I'm open to ideas how to resolve the problem without buying a new NIC.

disturbed1 05-29-2009 07:30 PM

Quote:

Originally Posted by Woodsman (Post 3557032)
I'm going to try to copy the 2.6.27.7 forcedeth source files to 2.9.29.2 and compile again. I have no idea whether that will work.

Will you post back to let us know how the above works?

I hope it does solve your issue. Nvidia does offer drivers. These have not been updated since 2007. MCP51 is defined in the source code.
http://www.nvidia.com/object/linux_nforce_1.23.html

Woodsman 05-29-2009 08:06 PM

Quote:

Will you post back to let us know how the above works?
The kernel failed to compile properly. All of the network related modules, that is. I'm no kernel expert and won't pretend to be. I only copied the forcedeth.c file. I might try again but be more selective about copying only sections of code. I diffed the files. Seems only a few spots might relate to my problem. As I'm no kernel guru I could be way off target with all of this.

I'm trying to surf to learn more about this problem. At this moment I'm compiling the 2.6.28.10 kernel and will report on that effort. If that kernel succeeds then the problem would seem to have occurred in the 2.6.29 series. There are many discussions online about forcedeth regressions but I haven't yet figured out which regression is the one affecting me.

I see Pat today updated Current to 2.6.29.4.


All times are GMT -5. The time now is 08:55 PM.