NForce4 Network Drops
Greetings,
I'm fairly new to the linux world (1 year exp). We have a large number of HP xw9300 workstations that have proven to be a pain. The latest joy and fun has been the integrated NIC randomly dropping its connection. I've tried turning off ACPI as recommended by nVidia. I've installed the latest nForce drivers and updated the resolve.conf file. HP was helpfull enough to provide the manual config options for the NIC as well as multiple BIOS updates. It doesn't matter if the card is set to 100 Full no nego, full nego, gigabit, or 10 Full the blasted thing still drops intermittently. I've tried DHCP, static, DNS and manually entering information into the hosts file. It doesn't seem to matter. When I go out on the switch the NIC just doesn't respond. No errors, it just doesn't respond. If I blow the NIC away and reload it the card comes back up, responds and runs. When it drops I get unexpected MAC address errors, unable to renew IP address errors or sometimes the blasted thing just stops responding. I know what you're thinking, bad NIC right? well we have seven of these machines running and they're all doing the same thing. They're on different switches, in different locations and running different operating systems.
RHE4_U2a
RHE4_U2
RHE3_U5
RHE4_U1 --Man I don't recomend this update with an NForce4 chipset it's a PITA to get to work.
Windows XP Professional SP2
Ultimately I'm going to have to deal with 60 of these animals. I was wondering if anyone had heard of this problem. If so were you using the NForce4 Extreme chipset? Did you fix the issue or did you opt to swap in a NIC as I'm probably going to do.
Second and more importantly,
Can anyone recomend a gigabit NIC that they know works. I would really appreciate the help. If you could please list your OS and its current update level when you post that would really help. Currently I'm looking at the Intel PRO/1000 MT adapter as a replacement option. Does anyone know of any gotchas out there when using this controller with RHE4?
Here's what nVidia has to say on the subject.
Network and other devices randomly stop working when ACPI is enabled
This problem may be caused by an incorrect ACPI table entry that causes the timer interrupt to be incorrectly configured.
If the kernel console boot trace (viewable using dmesg) contains messages such as these:
..MP-BIOS bug: 8254 timer not connected to IOAPIC
...trying to set up timer (IRQ0) through the 8259A . failed.
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ... works.
then the incorrect ACPI table entry is present. On 2.6 kernels, this can be worked around by specifying the 'acpi_skip_timer_override' boot line option. An alternative workaround is to disable ACPI in the BIOS or by using the 'acpi=off' boot line option.
My modprobe.conf file is listed below:
alias eth0 nvnet
options etho hwmode=2 auto_negotiate=0 force_speed_duplex=4
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptscsih
alias scsi_hostadapter2 aacraid
alias scsi_hostadapter3 sata_nv
alias snd-card-0 snd-intel8x0
options snd-card-0 nvsound
#install snd-intel8x0 /sbin/modprobe --ignore-install snd-intel8x0 && /usr/sbin/alsactl restore >/dev/null 2>&1 || :
#remove snd-intel8x0 { /usr/sbin/alsactl store >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-intel8x0
alias usb-controller ehci-hcd
alias usb-controller1 ohci-hcd
I'll post the logs in the morning.
Last edited by jerickson; 12-20-2005 at 07:03 PM.
|