-   Linux - Hardware (
-   -   Ethernet locks box at ifup! (Kernel 2.4.X, RH7.2) (

zaphraud 01-15-2003 08:57 PM

Ethernet locks box at ifup! (Kernel 2.4.X, RH7.2)
My VA503+ motherboard w/ K6/2 550 w/ 512mb SDRAM freezes the instant "ifup" is run on eth0 under any 2.4.X kernel, RedHat 7.2. The odd thing is that it works flawlessly under other Linux OSes, including RH 6.2, Slackware 3.? (with upgrades, long ago, etc) ... I recently experienced a hack/crash related to my leaving an insecure FTP server up (oopsie - got the Linux Hacktop installed.. sigh) and had no luck locating ISO's of RH6.2, so I figured why not go to 7.2

I have tried the following ethernet cards:
Linksys LNE100TX (the first one) - (Tulip driver)
Siemens Speedstream SS1020 - (RealTek I think)
CompUSA generic - (Tulip again)

All produce the exact same symptom: A total lock of the hardware IMMIADIATLY upon running "ifup eth0". Nothing seems to be configured wrong, and I've been using Linux since late 1995, so I'm really puzzled as to why this is or why I am apparently the only one having this exact problem. I first thought it was driver dependant, but its not related to driver or hardware - even curiouser, the box is rock solid doing anything else, including X, its been running MP3s nonstop for days as background music in my apartment (but without network control this feature isnt as fun as it once was), etc.

Here are some other goodies installed in that box:
*two standard-type hard disks, one on each IDE controller
*Sony 52x IDE CDRom, slave on one IDE controller
*100mb Iomega Zip Drive, slave on the other
*Virge DGX4 4MB PCI video card
*Generic MAD16 type sound card on ISA bus
*Mouse on COM1. The other serial port is disabled in BIOS and is not using an IRQ.
*Paralell is disabled in BIOS
*USB is disabled in BIOS
(yes, there appears to be plenty of IRQs)

I've tried the RH7.2 default kernel (freezes on module bootup), a recompiled 2.4.10 kernel (same), a 2.4.10 kernel with ethernet not as a module (freezes on ifup), and a 2.4.20 kernel (freezes on ifup) - so I'm close to wits end with this problem, and am close to figuring either I'm screwed with 2.4/this mobo, or I'm missing something absurdly obvious. The second likelyhood has dimmed over the weeks as I have failed to locate any mention of a similar problem, in the past the only way I've ever locked hardware that badly in Linux was by overclocking the videocard in XFree (overclock the CPU and the damn thing doesnt boot).

Im going to slide Kernel 2.2.X under RH7.2 (I'll see what this breaks) .. if it still doesnt work, then I've screwed something else up or the hardware is Used Up (it has been the magic three years afterall). It just sucks having to burn a CD every time I want to put another kernel in the thing, since it lacks Ethernet!!

finegan 01-15-2003 09:08 PM

Tear apart if-up, it does about 8 things, so do them 1 at a time and see what happens, can the card be given a straight ifconfig?

ifconfig eth0 up

what does /var/log/messages say about the choke?



zaphraud 01-15-2003 10:31 PM

Already tried that actually (unfortunately)

I even have a special command "prepare to die" that shuts down processess, unmounts some partitions (/mp3 is one that takes a good long while to fsck, as all of my CDs are stored there), then stops so I can test an ifup command without having to spend forever on the reboot...

it fails on the line:
if ! ip link set ${DEVICE} up; then
echo $"Failed to bring up device ${DEVICE}."

By failing, I mean I never get that error message at all. Prior to the above code, I have inserted:
echo "Debugging 4"
just after it
echo "Debugging 5"
(yeah there were other numbers :) Debugging 4 makes it thru, 5 never does.

ifconfig -a, prior to doing anything, shows (omitted HW addr fingerprint):
Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interupt: 11

(other interupts, from /proc/interupts: 0 timer,1 keyboard,2 cascade,4 serial,7 mad16 wss,8 rtc,14 ide0,15 ide1)

/var/log/messages shows me shutting down some processes with that process killer script so I can unmount volumes, then the next line right after that is the first line of the entries from the reboot (the restart line) so no help there.

"ifconfig eth0 up" locks the hardware in an identical manner. The 2.2 kernel just downloaded, I'm writing it to a CDRW now...

zaphraud 01-16-2003 12:01 AM

I just compiled kernel 2.2.23 and the ethernet again works just fine!

This actually saddens me, because it means that something is fundamentally wrong with the 2.4.x kernel series.

On the other hand, I can get back to using this windows box, with its overpowered CPU* for games and massively-paralell protein folding and psychedelic winamp screens, and the linux box for email and such, so the girlfriend can't snoop!

*does anyone else think that over 60 watts is a little high (the previous version of the Athlon XP2100 used nearly 80!)?

finegan 01-16-2003 06:26 PM

Its at least an improvement, the most power sucking monster that AMD ever produced was the Thunderchicken 1.4Ghz, I've got a friend that has one, and a 400Watt PSU and a 1KG! HSF, and still gets heat stroke outs. I convinced him to start running open case and issues have declined, but its near the ceiling as he has cats...

I'm getting so sick of CPU heat that I'm thinking of just buying a transmeta and getting over this.

Back to the 2.4 issue... did you see any compile specific chipset tunings for your board's chipset in the kernel options? This one is just a little too weird.



zaphraud 01-16-2003 07:19 PM

The only chipset specific options i've used are for the IDE controller, the optimizations (read: bug fixes) for the VIA82CXXX chipset.

I tried compiling with and without that, also tried compiling the kernel for a 486, 586/K5/6x96, and K6 class machine. Im sure I didnt hit every permutation, but... (they all booted just fine tho.)

Were it not for the persistant pull of my own curiosity, I would just leave well enough alone with a 2.2 kernel as its quite functional.

My gut feeling is that this problem is rooted in "plug-and-pray" and possibly an IRQ conflict of some bizarre nature. Mysteriously, everything else functions wonderfully in 2.4.x, and its not a problem specific to any given Ethernet card either (the most bizarre part of all. I was sure buying two more cards, one would work, initially I was positive it was the newish tulip driver at fault, but it has been exonerated by the duplicate failure under the RealTek 8139 driver..)

My next attempts at getting the proper kernel running on this box are going to be pushing the IRQs around with the BIOS (doesnt seem to be a conflict, but maybe there is anyways?) and running the box without the ISA sound card. (Oh yeah, the onboard sound IS properly disabled. I dont even have the riser board anymore, actually.)

Should those fail, I'll post a copy of the .configure file for perusal and maybe someone can shed some light on whats going on... after that it looks like adding debugging messages to the ethernet driver itself is the next course of action (ICK!)

finegan 01-16-2003 07:39 PM

You think it might be a PCI bridge issue? Nah, the rest works... that is just funky. I was just curious as you might have something lkml worthy.



mcleodnine 01-16-2003 07:52 PM

This isn't one of those PCI conflicts with enabled on-board hardware and one or two of the PCI slots is it?

How is the 2.2.x kernel accessing the PCI bus? (options are ANY, BIOS if I recall correctly).

Just throwing anything at the wall to see if it sticks...

zaphraud 01-16-2003 08:10 PM

Its set to "ANY". Haven't tried changing that one yet. The motherboard also has a jumper for the SoundPRO chipset which is also disabled, so its not that. There is no on-board ethernet.

The kernel is still compiling... its reletively huge.

zaphraud 01-16-2003 09:01 PM

With this 2.4.20, was able to modprobe 8139too successfully this time, but the kernel died in the "standard" (for me) fashion after running /sbin/ifup eth0. When I was playing with the tulip driver, it would crash on modprobe if a module, or on ifup if built into the kernel. So there is a difference in how its acting there.

Pushing around the interupts in BIOS failed (it was a distant shot anyways). Just took the sound card out entirely and I'm trying again as soon as fsck gets done with /usr ...

Removing the sound card didnt change anything either. The only two cards in the system right now are the video and the network card. The video card has an IRQ assigned in BIOS, so it should be generating a conflict. This time eth0 wanted IRQ 10, which is free, and theres no sound card in there to bugger it up regardless. Not only that, I hadn't started X, so its probably not the video either. So maybe its really not an IRQ conflict.

Just for the record I've also tried this with two different CPUs now ...

Lets assume for the moment that I have a buggered motherboard, and have finally revealed its flaw with a kernel change. I guess what I should really be asking at this point is what all changed between 2.2 and 2.4 in the networking code? Not featurewise, but specifically. Any ideas as to where I should start debugging this problem? The RH7.2 tools (ifup, etc...) work fine with 2.2.x and RH7.2 shipped with a 2.4 kernel, so the tools probably aren't at fault 'cause they shouldn't really be working at a component level to cause this sort of crash, right? So is there anything vastly different about how all this is handled in the kernel that may be revealing this flaw?

The motherboard is (was.. ) common enough, and use of older linux boxen as routers and such is also common enough that it seems unlikely to me that someone else wouldnt have run across this. Then again, maybe no one tried to run a "newish" distro on old hardware, maybe people just said screw it and ran 2.2.x and never said anything, or maybe they tossed the depreciated mobo and got another? I have no idea at this point :-P

mcleodnine 01-16-2003 09:40 PM

maybe it's the ifup script?

zaphraud 01-16-2003 10:05 PM

>maybe it's the ifup script?

Nah I took that apart (see above) and was able to verify that both:

ifconfig eth0 up
ip link set eth0 up

cause the hardware lockup. So its not the script. Its probably not ip and ifconfig binaries either as they work fine in kernel 2.2.x and are the binaries as-they-shipped with RH7.2 (which had a 2.4.x kernel as-shipped. 2.4.10 i think, maybe 2.4.7). So if there was something fubar with how they interacted with the kernel, it would have munged up a bunch of people and we'd all know about it by now...

Literally the only thing I change to make it work is to hit ctrl-shift-space a bunch (I can never remember which one anymore, its changed over the years, so I push em all) and have lilo load "old" to run the 2.2 kernel. So I get a different kernel out of /boot, and different modules out of /var/lib/* as a result of /proc/version being different, but everything else on the disk is identical.

has the same mobo and CPU running under RH 7.1 ...
stock redhat 7.1 has kernel version 2.4.2-2 right?

If I can find a supported 10megabit ISA card, I will compile for and test that, at least it will narrow it doen a little bit. I think I have one floating around somewhere, but I just moved and I did chuck some stuff so I dunno.

zaphraud 01-16-2003 10:07 PM

the possibility questioned in the above post being that the user in the web page quoted may in fact have such hardware in an old system... just realized I didnt say so outright.

finegan 01-16-2003 10:20 PM

If you want to keep banging on this one, I'm putting my money on it being a PCI issue and not networking. It might be some of the dynamic pci chipset tuning stuff added in... dunno, sounds like a fun thing to fiddle with for a while though. My reasoning is that the problem would be a rarer bug if it were a PCI chipset issue then with uber-common network cards like the tulip and rtl 8139s.





zaphraud 01-16-2003 10:28 PM

I'll know in about half an hour. Just located and installed an old (does it still work? lol) NE2000 ISA card, reconfigured to add that obscure module in, and recompiling the whole ball o wax just to be sure.

All times are GMT -5. The time now is 05:34 AM.