LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
Search this Thread
Old 05-04-2004, 04:12 PM   #1
greggiepoo
LQ Newbie
 
Registered: May 2004
Location: Baton Rouge, LA
Distribution: Fedora Core 1
Posts: 19

Rep: Reputation: 0
Exclamation PCI IRQ issue w/ 3 NIC's


I'm running Fedora Core 1 as a firewall (Shorewall) with 3 attached networks (net, loc, dmz). Motherboard is an ASUS A7N266-VM/AA (Athlon XP 1800+). Server chassis is 2U Rackmount with a PCI riser card, allowing for 3 PCI cards, which are filled by 3 NIC's (onboard LAN is disabled in BIOS).

NIC's installed:
1x3C905B
1x3C905C-TX-M
1xRealtek RTL-8139

For a while, my firewall was crashing randomly with no relevant messages anywhere in the log files. These crashes would completely freeze the machine forcing a hard reset. After scratching my head for a while, I added "apm=off nohlt" to my kernel options, rebooted, and waited for it to crash again.

When I got to work this morning, the Internet was down, so while I was disturbed, I was excited at the same time to see if the kernel options had helped. Sure enough, the firewall was responsive to input from KB/Mouse, but routing was definitely not happening between the three interfaces. Upon searching the logs, I found TONS of these messages:

fw kernel: eth0: PCI bus error, bus status 80000020
fw kernel: eth0: Host error, FIFO diagnostic register 0000.
fw kernel: eth0: Too much work in interrupt, status e003.

I restarted the network services, and the interfaces went down and up again, but I was still receiving the same messages in the logs and network traffic was down. I did a soft reset, Linux rebooted and everything has been fine ever since. It seems that when there is a lot of traffic flowing through this box that the errors occur.

I happened upon this thread with Donald Becker (who wrote the 3c59x drivers from what I understand): well, since this is my first post I cannot post a URL, but you can find it at tux.org. The title of the thread is [vortex] 3c59x LK1.1.16 Linux-2.4 PCI bus error/Host error. I suppose you'll have to search that site to find the discussion, since I can't post URLs yet. Sorry.

Basically, from reading this thread, I understand more about the actual problem, but I still have no idea how to fix it.

Here is my output from lspci -vvx reporting my NIC's:
------------------------------------------------------------
01:06.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30)
Subsystem: 3Com Corporation 3C905B Fast Etherlink XL 10/100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2500ns min, 2500ns max), cache line size 08
Interrupt: pin A routed to IRQ 5
Region 0: I/O ports at d800 [size=128]
Region 1: Memory at e6000000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: b7 10 55 90 17 00 10 02 30 00 00 02 08 40 00 00
10: 01 d8 00 00 00 00 00 e6 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 55 90
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 0a 0a

01:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
Subsystem: AOPEN Inc. ALN-325C
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+
Latency: 64 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 5
Region 0: I/O ports at d400 [size=256]
Region 1: Memory at e5800000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: ec 10 39 81 07 00 90 82 10 00 00 02 00 40 00 00
10: 01 d4 00 00 00 00 80 e5 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 a0 a0 07 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 05 01 20 40

01:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2500ns min, 2500ns max), cache line size 08
Interrupt: pin A routed to IRQ 6
Region 0: I/O ports at d000 [size=128]
Region 1: Memory at e5000000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00: b7 10 00 92 17 00 10 02 78 00 00 02 08 40 00 00
10: 01 d0 00 00 00 00 00 e5 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 00 10
30: 00 00 00 00 dc 00 00 00 00 00 00 00 06 01 0a 0a

As you can see, the Realtek and the 3c905B are sharing IRQ 5, and all 3 NIC's have the "BusMaster" flag set. Seems to me that the solution would be to make only one NIC be the BusMaster and/or force all 3 NIC's to listen on different IRQ's.

When this last crash happened, my kernel options were: "apm=off nohlt acpi=off" (I thought that by disabling ACPI in Linux that it would assign different IRQ's to the 3 NIC's, but apparently it didn't).

So... I guess my question is this: would disabling the PCI BusMaster on 2 of the NIC's solve the problem? If so, how do I accomplish this in Linux? And/or do I have to assign different IRQ's to the 3 NIC's? If so, how do I accomplish this as well?

Any feedback would be appreciated VERY much. This is a production firewall, and I'm basically having to work myself to death until this problem gets solved.

Thanks in advance,

Greg
 
Old 05-05-2004, 01:48 AM   #2
greggiepoo
LQ Newbie
 
Registered: May 2004
Location: Baton Rouge, LA
Distribution: Fedora Core 1
Posts: 19

Original Poster
Rep: Reputation: 0
Sorry about this...

--bump--
 
Old 05-05-2004, 04:50 AM   #3
chort
Senior Member
 
Registered: Jul 2003
Location: Silicon Valley, USA
Distribution: OpenBSD 4.6, OS X 10.6.2, CentOS 4 & 5
Posts: 3,660

Rep: Reputation: 69
Well, using good NICs would probably solve this. I have 3 Intel 8255x cards in my firewall, they all share a single interrupt, and they work great (actually, with the same driver common to all cards it's *best* if they share a single IRQ).

Realtek are infamous for very poor quality cards that stress the CPU quite a bit, and some of the 3COM cards are quirky (and some are also quite good, it has to do with the particular chipset).

Any card built on these chipsets should be great:
Intel 8255[5|7|9]
DEC/Intel 21143

As a stop-gap measure, you could try shuffling the cards around between the PCI slots, since some devices just do not like certain IRQ configurations (who knows why, but this trick has worked for me a few times in the past). You can also try shutting off all unneeded features in the BIOS, for instance if you're not using USB etc, that may free up an IRQ which the BIOS may then assign. Remember to select the BIOS option to "clear hardware assignments" or something like that, so it will re-assign IRQs at next powercycle.
 
Old 05-05-2004, 12:51 PM   #4
greggiepoo
LQ Newbie
 
Registered: May 2004
Location: Baton Rouge, LA
Distribution: Fedora Core 1
Posts: 19

Original Poster
Rep: Reputation: 0
Thanks for the reply!

I disabled everything in BIOS already, but can't find the "Update ESCD" option anywhere in BIOS. It's AwardBIOS stripped of everything that seems to be useful in this scenario!!!

ARGH!
 
Old 05-06-2004, 12:00 AM   #5
greggiepoo
LQ Newbie
 
Registered: May 2004
Location: Baton Rouge, LA
Distribution: Fedora Core 1
Posts: 19

Original Poster
Rep: Reputation: 0
Ok, well today I bought 10 new NIC's (5 Intel Pro/100S and 5 Intel Pro/1000 MT). I'll try them out when they come in. Hopefully I'll have better luck!

Thanks for posting, chort.
 
Old 05-06-2004, 04:12 AM   #6
chort
Senior Member
 
Registered: Jul 2003
Location: Silicon Valley, USA
Distribution: OpenBSD 4.6, OS X 10.6.2, CentOS 4 & 5
Posts: 3,660

Rep: Reputation: 69
The Pro/100S should work fantastically, but I'm not sure about the MTs. I know there isn't a driver yet for OpenBSD, but perhaps there is one for Linux.

PS, in case you think I lead you astray, you'll notice that the chipset for the MT card is not one of the four I listed, just covering myself there

Last edited by chort; 05-06-2004 at 04:13 AM.
 
Old 05-06-2004, 01:51 PM   #7
greggiepoo
LQ Newbie
 
Registered: May 2004
Location: Baton Rouge, LA
Distribution: Fedora Core 1
Posts: 19

Original Poster
Rep: Reputation: 0
I bought the Pro 1000 MT's for another use. I guess I shouldn't have said anything about them, just to be clear. I don't need Gigabit Ethernet in a firewall.

Let's hope I have better luck with the Pro 100/S cards!
 
Old 05-06-2004, 04:01 PM   #8
cl2imson
LQ Newbie
 
Registered: Feb 2004
Location: Norman, OK
Distribution: Fedora, Core 2
Posts: 23

Rep: Reputation: 15
I was having similar problems (i am relatively new to Linux) with cards from SMC.

I had 2 SMC1208BTA's (yes, i know, why AUI and BNC - we need them for our 10Base2 and 10Base5 LANs that are required). Anyway, my point was those two 1208 BTA nic's coupled with a *cough* Compaq gave me all sorts of problems, no matter what the configuration. As a matter of fact more than one flavor of Linux had problems with similar configurations :\

Some symptoms that I noticed that eth0, eth1 and eth2 were *not* inconfigurable, but came with weird results. Such as the cross reporting of what device controls what, so when I would change the IP addy on eth0, it would reflect that in the config files, and with ifconfig, but the address was assigned to the wrong card in actuality. It pinged when connected to the opposite network.

Spooky.

Anyway, after 2 straight days of forum scampering and pulling mah hairs out, I finally decided to try a *different*
NIC manufacturer. (I replaced countless SMC1208BTA's just to be sure) My configuration was eventually Compaq NIC, SMC NIC and Intel NIC, and everything worked rather nicely.

My conclusion was an IRQ problem, but I didn't want to try to solve this manually so I figured changing manufacturers might force an IRQ change.

Anyway, thought this might help.

Last edited by cl2imson; 05-06-2004 at 04:03 PM.
 
Old 02-07-2005, 07:33 PM   #9
alke
LQ Newbie
 
Registered: Jul 2003
Location: Finland
Distribution: Debian
Posts: 8

Rep: Reputation: 0
Just thought I'd air my views on this subject as well, at least help someone further out.

I recently installed Debian 3.0r4, upgraded from stock kernel to 2.6.8 and set up my router with 2 x SMC Ultra NICs I found from the bottom of my drawer - and I get the same workload problem. Kernel messages say:

Code:
eth0: Too much work at interrupt, status 0x01
While Googling away I found out that most of these problems are, in fact, directly related to a computer using either SMC or NE2K (or NE2K compatible) interface cards, and one solution would have been to patch the 8390.c network driver by hand, or replace the cards.

Luckily, this message isn't hanging my pc, nor is it causing much troubles, but collisions are aplenty:

Code:
[ 0 02:39:22 root@likwid ~ ]# ifconfig
eth0   Link encap:Ethernet
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2857794 errors:4 dropped:0 overruns:0 frame:745
          TX packets:846936 errors:0 dropped:0 overruns:0 carrier:0
          collisions:328663 txqueuelen:1000
          RX bytes:3469151683 (3.2 GiB)  TX bytes:59777585 (57.0 MiB)
          Interrupt:3 Base address:0x290 Memory:d0000-d4000
..so it sure looks like I'm going to have to replace the NICs with 3COM 3c509 cards since I've never hard much trouble with them running on another Linux box and recompiling the drivers with some weird switch just to see if it works or not, isn't exactly what I call efficient.

Last edited by alke; 02-07-2005 at 08:12 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Assign IRQ to PCI video card. Borelian Linux - Hardware 1 04-08-2005 05:54 PM
Getting a NIC's IO/IRQ without M$ lixy Linux - Hardware 6 02-19-2005 06:59 PM
changing pci irq settings McCloud Linux - Hardware 0 12-11-2004 01:48 PM
How to get IRQ/IO from PCI Dynalink IS64PH ISDN Adapter Vorik Linux - Hardware 1 10-17-2004 09:08 PM
pci irq routing? linus24 Linux - General 2 03-29-2004 08:43 PM


All times are GMT -5. The time now is 04:05 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration