LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
Search this Thread
Old 02-09-2008, 09:46 AM   #1
madbrad
LQ Newbie
 
Registered: Feb 2008
Posts: 15

Rep: Reputation: 0
Realtek RTL8111/8168B IRQ clash? Hardware errors with high activity


Hi. I've got a brand-new system with a Gigabyte P35-DS4 motherboard, which has an embedded Realtek RTL8111/8168B gigabit network controller. I'm running Linux 2.6.23.14, freshly fetched from kernel.org a couple of weeks ago.

The system was running perfectly ... until I decided to start using the network. With both the Linux kernel's r8169 module and the r8168 driver from realtek.com.tw - separately loaded, one at a time - I have the same problem - the driver loads properly, the eth0 interface configures properly, all the networking functions operate correctly ... but when I receive packets at the full 100Mbit/s rate from another machine (both my eth0 and the other machine auto-negotiated to 100Mb/sec full duplex) I see various errors suddenly pop up in the syslog:

sshd[4685]: error: channel 0: chan_read_failed for istate 1
sshd[4685]: error: channel 0: chan_read_failed for istate 3
last message repeated 20 times
kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover by ending request.
last message repeated 3 times
kernel: ide: failed opcode was: unknown
kernel: hda: drive not ready for command
kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }

And so forth.

When either of the r8169/r8168 modules are loaded they report as follows in the log (this example is the regular Linux (kernel.org) r8169 module):

kernel: 8169 Gigabit Ethernet driver 2.2LK loaded
kernel: ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17
kernel: eth0: RTL8168b/8111b at 0xf8d1c000, 00:1a:4d:58:a3:54, XID 38000000 IRQ 17

A look at the IRQ 17 line in /proc/interrupts shows that the IDE driver and the Realtek driver are both sharing IRQ 17:

# fgrep eth /proc/interrupts
17: 58077 4 98999 129160 IO-APIC-fasteoi ide0, eth0

Given the kernel messages about 'hda' - which is my sole IDE disk device on the system, the DVD-ROM drive (all my hard disk drives are SATA/AHCI) - it seems to me that the realtek driver is losing interrupts, or the IDE driver is picking up the interrupts destined for the ethernet device. But it's been a loooong time since I had to play with PC hardware and interrupts ... I don't have a clue how IRQs are (automatically?) assigned on a PCI bus these days, nor how to change things.

Has anyone had this problem with the embedded Realtek RTL8168/8111 driver and hardware interrupt confusion with moderate to high network activity?

How can I 'move' the Realtek device to another interrupt? Is there a general 'what to do with messy interrupt conflicts on PCI busses' HOWTO out there for a hardware novice?

Many thanks for any help ... I'm rather desperate - I thought this new system was working fine until I started to use it for real over the network! :-(

Regards,


Brad
 
Old 02-10-2008, 05:48 AM   #2
tredegar
LQ 5k Club
 
Registered: May 2003
Location: London, UK
Distribution: Debian "Jessie"
Posts: 6,033

Rep: Reputation: 371Reputation: 371Reputation: 371Reputation: 371
Looks like it might be an IRQ sharing problem.
You could try passing (one, two, or all - sorry: you'll have to experiment, only 7 combinations to try!) of these kernel options:
noapic
nolapic
acpi=off

to the kernel at boot time (just add them to the end of the "kernel" line in /boot/grub/menu.lst and reboot)

Then check your bootlogs to see what is happening.

To be on the safe side, I'd recommend creating a new boot entry in menu.lst to play with these options (just copy your current entry, but change the title to something like Testing), just in case one of these options prevents the kernel from booting at all - then you still have your original to fall back on.

Let us know how you get on.
 
Old 02-10-2008, 07:05 AM   #3
madbrad
LQ Newbie
 
Registered: Feb 2008
Posts: 15

Original Poster
Rep: Reputation: 0
Hi tredegar!

Quote:
Originally Posted by tredegar View Post
Looks like it might be an IRQ sharing problem.
You could try passing (one, two, or all - sorry: you'll have to experiment, only 7 combinations to try!) of these kernel options:
noapic
nolapic
acpi=off
Well, that wasn't so bad ... I was afraid you'd want me to do all the permutations of them in different order, too! :-)

I've been spending all day on this $$"!&^%$$!! problem and I've narrowed things down a little.

First of all, the sshd errors were a red herring; more searching the web showed that it was a problem with the version of sshd that I was running. I upgraded and *those* particular error messages went away.

The core problem, though, remains. Any decent network activity and I get heaps of these messages logged:

kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover by ending request.
last message repeated 3 times
kernel: ide: failed opcode was: unknown
kernel: hda: drive not ready for command
kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }

And in fact the KDE "someone has just put in a music CD; what do you want to do with it?" popup window springs up every time, it's that confused!

It seems to me that the IDE driver is using the same interrupt IRQ as the embedded Realtek r8169 driver, and with any significant network activity the IDE driver is reading and acting on some of the interrupts.

When my machine boots I can see that the 'Native IDE controller" and the "Network Controller" on the PCI bus have the same IRQ - 15. And the problem is that, when Linux boots, it puts them both on the same IRQ there.

With a normal boot I have both 'ide0' and 'eth0' sharing IRQ 17 in /proc/interrupts:

17: 58077 4 98999 129160 IO-APIC-fasteoi ide0, eth0

With 'noapic' they both get shifted to IRQ 15; all the IRQs are moved 'down' to lower numbers, and instead of 'IO-APIC-fasteoi' and such the entries are all instead 'XT-PIC-XT' (I wish I knew what the difference was). But I still have the same problem with any network activity.

With 'acpi=off' I get the normal behaviour (shared on IRQ 17); with 'nolapic' the kernel hangs right after loading the ide driver, reporting 'hda: lost interrupt' messages. The other combinations of those three boot options all have the same results - either shared 'IO-APIC-fasteoi' IRQ 17 or shared 'XT-PIC-XT' IRQ 15. And the same problem.

When I use the BIOS to 'reserve' IRQ 15 the bios - and then Linux - use a different IRQ ... but for BOTH drivers, again having them share the same interrupt.

I compiled a new kernel with no IDE driver whatsoever - just as a test, I'd like to actually be able to use my DVD-ROM while surfing the net :-) - and Linux then shared the 'eth0' driver with the 'libata' driver, both of them using the same IRQ (17 again, I think). 'libata' is the SATA driver, is that correct? Anyway, the problem DISAPPEARED in that scenario.

Searching the internet for the message:

kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover by ending request.

showed that people were scratching their heads over it back in 2005; there were a few messages in the linux-kernel mailing list about it, pointing at 'IRQ routing' as being the problem. But I couldn't find any solution.

So it seems to me that:

- the problem is due to the IDE and R8169 drivers sharing the same interrupt;

- maybe it's the IDE driver which is badly behaved, as the libata driver presumably was exposed to the same rush of shared interrupts from the network activity, but didn't have - or log - any problems;

- there seems to be no way I can get the embedded Realtek network controller to use a different IRQ - there's nothing in the bios that will let me do it (only block off IRQs from being used, but Linux then just finds another IRQ to have both IDE and eth0 share). I've downloaded several Realtek utilities but they will only *display* the IRQ, not change it.

It's really embarrassing how little I know of modern PC hardware these days. A decade ago ... well, maybe a few years more ... I was happily solving interrupt problems with conflicting ISA cards and the like. As a modern hardware naif it seems to me that Linux can quite happily re-route IRQs merrily as it boots ... so surely there's a way to tell it 'excuse me, please put the Realtek driver on its own interrupt'? Or the IDE driver?

It seems all I can do is either:

A. Find a way to get the Realtek hardware to change its IRQ; but I think I've exhausted that possibilty. The BIOS won't let me do it, no utility I can find will, the motherboard manufacturer doesn't mention it.

B. Try and find a way that the Linux kernel allows one to meddle with the 'IRQ routing', to stop those two drivers from sharing an interrupt.

C. See if there are options to toughen up the IDE driver? There was a kernel directive IDEPCI_SHARE_IRQ which seemed to be EXACTLY what I wanted, so I set it to 'N', but the problem remained.

Thanks sincerely for your advice, this is all very frustrating; I appreciate your time!


Brad
 
Old 02-10-2008, 08:28 AM   #4
tredegar
LQ 5k Club
 
Registered: May 2003
Location: London, UK
Distribution: Debian "Jessie"
Posts: 6,033

Rep: Reputation: 371Reputation: 371Reputation: 371Reputation: 371
Thanks for your lucid post.
Quote:
It seems all I can do is either:
A: I can't find a way to do this either
modinfo r8169 didn't help much, but there is a module option for Debug verbosity level


B: That would be a good idea, & I thought those kernel options might help.
There's more info on kernel options and interrupts here:
http://www.kernel.org/pub/linux/kern...n_pdf/ch09.pdf
You might make more sense of it than I do!
Maybe try acpi=noirq ?

C: I don't know

Searching shows me there seem to be a lot of problems with your chipset & linux.
The wimp's way out may be to try disabling your Realtek RTL8111/8168B in your BIOS and trying a different network card.

One other thought: Is there anything in your BIOS that you might be able to change that could alter the way interrupts are being handled? [Eg set PnP BIOS=NO ]
 
Old 02-10-2008, 08:54 AM   #5
jay73
Guru
 
Registered: Nov 2006
Location: Belgium
Distribution: Ubuntu 11.04, Debian testing
Posts: 5,019

Rep: Reputation: 130Reputation: 130
Have you tried using the irqpoll boot argument? It has helped me before although the last time I was having issues like yours, it didn't do anything. I guess it's worth a try.
 
Old 02-10-2008, 01:17 PM   #6
Loosewheel
LQ Newbie
 
Registered: Jan 2006
Location: Montana
Distribution: PCLinuxOS-2007
Posts: 24

Rep: Reputation: 15
'ifconfig' gives an option: irq addr
Set the interrupt line used by this device. Not all devices can
dynamically change their IRQ setting
 
Old 02-10-2008, 05:12 PM   #7
madbrad
LQ Newbie
 
Registered: Feb 2008
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by tredegar View Post
modinfo r8169 didn't help much, but there is a module option for Debug verbosity level
Yes, I tried that; it didn't tell me anything useful. It's *really* frustrating how I can't seem to find any way to tell the embedded Realtek controller to just use another jolly interrupt! An 'irq=XXX' module option would have been perfect :-(

Quote:
There's more info on kernel options and interrupts here:
http://www.kernel.org/pub/linux/kern...n_pdf/ch09.pdf
You might make more sense of it than I do!
Maybe try acpi=noirq?
I think I've tried most of possibilities listed under 'Interrupt Options'! 'acpi=noirq' didn't make any difference.

Quote:
The wimp's way out may be to try disabling your Realtek RTL8111/8168B in your BIOS and trying a different network card.
Ugh. I know people have had problems with the chipset, but can't find much about this specific one (lots of problems with the JMicron southbridge when it first came out, I think). And the motherboards that have these chipsets are so prevalent, how are they getting around this? Are they all running Windows? :-(

Quote:
One other thought: Is there anything in your BIOS that you might be able to change that could alter the way interrupts are being handled? [Eg set PnP BIOS=NO ]
I wish there was, but no. Nothing that allows me to change the IRQ of the network controller (or any other device), nothing at all about PnP other than the one bios page/menu which only allows me to reserve or block various IRQs ... but when I do that Linux just moves BOTH the IDE and Network drivers to share another interrupt, together. And the IDE driver just doesn't like that :-(

Quote:
Originally Posted by jay73
Have you tried using the irqpoll boot argument? It has helped me before although the last time I was having issues like yours, it didn't do anything. I guess it's worth a try.
I tried it; it seemed to make the IDE driver must less 'sensitive'; it took a full minute for KDE to think that a ghost had inserted a music CD in the drive. But the same error messages, just a bit slower. I think the nature of the 'irqpoll' option, from what it says in the documentation, may just slow things down in general.

Quote:
Originally Posted by Loosewheel
'ifconfig' gives an option: irq addr
Set the interrupt line used by this device. Not all devices can
dynamically change their IRQ setting.
Loosewheel, that option would have been BRILLIANT if it worked! I had no idea that ifconfig could do that, but there it is sitting in the output of an 'ifconfig -a'. Plus I've noted that the 'r8169' Realtek driver only seems to 'grab' its interrupt - the 'eth0' driver only appears in /proc/interrupts - after I've actually assigned an address to a plumbed eth0 device and up'ed it. That should have told me that ifconfig itself was doing some sort of interrupt configuration/activation magic.

Anyway, I tried that - no luck:

irq: SIOCSIFMAP: Operation not supported

Would have been perfect if it had worked. :-(

Thanks for the help fellows. I don't get it; the Intel P35 chipset is pretty modern and popular, I thought, and lots of motherboards - I believe - have both it and the embedded Realtek network controllers. I wonder how they're getting around this?


Brad
 
Old 02-10-2008, 06:28 PM   #8
jay73
Guru
 
Registered: Nov 2006
Location: Belgium
Distribution: Ubuntu 11.04, Debian testing
Posts: 5,019

Rep: Reputation: 130Reputation: 130
Quote:
I wonder how they're getting around this?
Essentially, not.
I have repeatedly been hit by that thing over the last year (similar motherboard), once on Fedora, once on FreeBSD and the other time on Ubuntu. I Wasted lots of time looking for a solution and eventually ended up switching to a different distro until the issue was solved by a kernel update. Unless you write your own kernel patches, there isn't much you can do.
 
Old 02-10-2008, 09:30 PM   #9
madbrad
LQ Newbie
 
Registered: Feb 2008
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jay73 View Post
I have repeatedly been hit by that thing over the last year (similar motherboard), once on Fedora, once on FreeBSD and the other time on Ubuntu. I Wasted lots of time looking for a solution and eventually ended up switching to a different distro until the issue was solved by a kernel update. Unless you write your own kernel patches, there isn't much you can do.
Did you discover a distribution that got rid of the problem, then?

And - I'm probably reading your post wrong - was the problem finally solved by a kernel update, or are you still on the good distribution waiting?

I had a friend e-mail me just half an hour ago that Ubuntu would fix all my problems ... and you've just said here that Ubuntu didn't work for you. :-( I'm keen to know what distribution you found that worked!


Brad
 
Old 02-10-2008, 11:18 PM   #10
jay73
Guru
 
Registered: Nov 2006
Location: Belgium
Distribution: Ubuntu 11.04, Debian testing
Posts: 5,019

Rep: Reputation: 130Reputation: 130
Well, Ubuntu Gutsy works fine. The previous one (what was it called again?) worked fine until my optical drives became useless after a kernel update. Same thing with Fedora 7 but now Fedora 8 is OK again. I guess this is nothing distro specific, it's probably just the kernel devs solving a problem, then causing a regression with the next update, then solving it once more. Your best bet would be to try different distros. If you can afford the space, install two. If one goes down, you have a quick alternative while you're waiting for things to get stretched out..
 
Old 02-23-2008, 11:35 PM   #11
madbrad
LQ Newbie
 
Registered: Feb 2008
Posts: 15

Original Poster
Rep: Reputation: 0
Just posting a summation of what I've found out to solve my problem, in case anyone else ever does a search.

First up, from a few recent posts in the linux kernel mailing list, it looks like the interrupt handler for the ide-cd module has been - or is in the process - of being rewritten. The message I saw (dated 14/2/08) said that the release candidate 2.6.25-rc1 kernel should have the fix. However I loaded up 2.6.25-rc2 today and the bug was still there. Still, the mention of the relevant change in the code - changing 'cdrom_pc_intr' to 'cdrom_newpc_intr' - suggests that my problem with conflicting interrupts between the Realtek and the CD-Rom will hopefully be fixed soon.

In the meantime I've found a workaround, the same one used by Ubuntu I think, which works out of the box, as noted here by jay73. I've disabled IDE entirely on my machine, and enabled the sr_mod module (CONFIG_BLK_DEV_SR). The sr_mod module apparently sits above the 'cdrom' driver and presents a scsi device - /dev/sr0 or /dev/scd0 - to the system. I think this was the only way to use a CD-ROM back a few years before the ide-cd driver came out.

Anyway, with IDE totally turned off in my kernel and sr_mod loaded I can use the DVD even though the 'libata' driver (rather than the 'ide0' driver) is still sharing the same IRQ as the Realtek device. Luckily the only IDE device in my system is the DVD drive so this workaround is sufficient until hopefully a new kernel fixes the bug.

I've realised I have several questions about how the kernel works out of all this ... for example, what does the Ubuntu kernel do if there are IDE devices (other than the DVD/CD) in the system? Is there another workaround? I tried various kernel boot parameters to try and keep the IDE driver enabled while telling it to 'ignore' the DVD drive - 'hda=scsi', 'hda=ide-scsi' - but nothing worked. Why is it that, when IDE is compiled in, a listing of /proc/interrupts shows that 'ide0' is using IRQ 17 with the network driver ... but when IDE is disabled 'libata' appears in its place? How does the kernel juggle the IDE and libata drivers around?

And, finally, what with all the various boot parameters to turn ACPI off, on and sideways, or otherwise meddle with the boot-time 'IRQ balancing' and such ... surely there would be a way to tell the kernel to move things around so that the IDE driver had its own unique IRQ? I thought IDEPCI_SHARE_IRQ would do it, but no.

Thanks for the help, Brad.
 
  


Reply

Tags
realtek


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
High paging activity malru AIX 7 06-19-2006 11:47 AM
SCSI Errors when IRQ Shared - How do I not share IRQ MQMan Linux - Hardware 2 01-09-2006 12:42 AM
HDD activity causes high load squisher Linux - Hardware 1 07-14-2005 09:46 AM
Debian freezes on AMD Athlon during high disk activity rgropmair Linux - Hardware 5 06-09-2004 03:31 PM
camera & hardware detector clash? marlaina1 Linux - General 1 01-04-2002 08:22 PM


All times are GMT -5. The time now is 10:12 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration