LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 10-05-2017, 02:04 AM   #1
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Rep: Reputation: 89
AMD-Vi: Event logged [IO_PAGE_FAULT (Ethernet controller)


This happens whenever under medium load. It is very consistent. When it occurs, there is no network connectivity until I ifconfig eth0 down/up, dhclient eth0. Then it works until I put a load on the network card again by just pushing some files back and forth. Basic Internet usage doesn't seem to trigger it, but higher speed traffic to/from a machine on the network and poof, gone. The network card still appears in lspci, ifconfig insists it is working, but no traffic will move until I reset it. Dmesg line is:

[ 1140.466954] AMD-Vi: Event logged [IO_PAGE_FAULT device=28:00.0 domain=0x000b address=0x0000000001091000 flags=0x0050]

Device 28:00:00 is:

28:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

Slackware 64 bit, current patch level, 4.4.88 kernel.

A bit of googling turned up some old complaints of this, having to do with the nic driver.

I'm not quite sure where to go from here, other than trying a different (newer or older) kernel. Advice welcome.

PS - this is not the same issue I reported yesterday about the nic vanishing. That was a different board, and replacing the mother board fixed that problem. This problem is occurring on a different motherboard, completely different symptoms.
 
Old 10-05-2017, 05:39 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 8,409

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
Page faults go back as far as the 8086, (=1980s) which had 20 address lines but only 16 bit registers :-/. The extra 4 Address lines got you from 64k to 1024k addressing space, and 4 bit paging registers came in. The curse of backward compatibility followed. Pages can and do exist with no memory. It's a memory related driver fault.
I would sniff around with ifconfig, route, lspci, ping, and anything else relevant to see what exactly goes wrong.
I'm having a wifi issue right at the moment, and sniffing around advanced my understanding of my issues. See post #6 & below https://www.linuxquestions.org/quest...nd-4175614974/
 
Old 10-05-2017, 12:19 PM   #3
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
Well the plot thickens. I can hammer the box with iperf from three different computers, and it is rock solid. I'm actually pulling about 950mbps.

Come to find out the failure was occurring when moving stuff from one specific computer to this one. The other computer is running Windows 7, and I was doing the file copy via the Total Commander from that end, samba share on this end. It would choke after about five seconds. Every time.

So I booted that computer to Linux and did the file copy from Linux via Midnight Commander. Rock solid. Tried it with rsync. Rock solid. Hammered it with iperf. Rock solid.

So now I'm going WTF? What is it about this particular file copy from this Windows computer using Total Commander that kills it in 3 seconds flat, when everything else works at max bandwidth just fine?

I have not as of yet found anything of interest in the logs that would shed light on this. I'm thinking of switching to a newer kernel, probably the 4.13.5, and see if this persists.
 
Old 10-05-2017, 01:28 PM   #4
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
Updating to the 4.13.5 kernel seems to have fixed the problem. My research to date indicates that similar problems were caused by the nic driver. Would a newer kernel have a newer driver, and is it safe to blame the driver that comes with the 4.4.88 kernel as the most likely cause?
 
Old 10-05-2017, 01:36 PM   #5
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 8,409

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
Quote:
So now I'm going WTF? What is it about this particular file copy from this Windows computer using Total Commander that kills it in 3 seconds flat, when everything else works at max bandwidth just fine?
The only thing I can think of is something in the network packets. There's fixed length headers and an expandable data section (up to 1500 bytes?). Overstepping length used to be a favourite hacker trick to compromise programs. Most things deal with it OK now, but the page fault could be related. That would be a windows, not a linux problem, although I would still file a linux bug on the driver.
 
Old 10-10-2017, 10:54 AM   #6
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
Annndd with the 4.13.5 kernel, the box randomly does a hard reset. No pattern that I can follow, just every hour or so BLAMMO hard reset.

After a day or so of this, I reverted to the 4.4.88 kernel, and the resets stopped. Now I have the nic failure to watch for but so far it is ONLY when transferring stuff from a Windows 7 machine. Since I have long since banned all Windows computers from this office, this is going to be a rare event, so maybe I'll just sit tight with the 4.4.88 kernel for a while....

<listens to twilight zone theme....>
 
Old 10-11-2017, 03:38 AM   #7
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 8,409

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
I know of no kernel fault that throws a hard reset. Even segmentation faults, which are BSOD faults in windows go down ok in the linux kernel. Nothing logged? That points at a very difficult-to-diagnose hardware thing.
 
Old 10-16-2017, 12:06 PM   #8
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
Quote:
Originally Posted by business_kid View Post
I know of no kernel fault that throws a hard reset. Even segmentation faults, which are BSOD faults in windows go down ok in the linux kernel. Nothing logged? That points at a very difficult-to-diagnose hardware thing.
This one is killing me. When it hard resets, there is nothing logged. I went back to the 4.4.88 kernel because of the hard resets I experienced with the 4.13.5 kernel, and I have not observed another hard reset since. However, I continue to get bit with the eth0 crash/disconnect problem, and when it does, someone has to walk over to the physical box and ifconfig eth0 down/up, or just reboot it.

Life was good before I updated to the 4.4.88 kernel. The 4.4.13 kernel never did this. I'm thinking of reverting back to the 4.4.13 kernel for now cause I gotta get this thing working. Can anyone think of any problems this might cause, or does anyone have any different kernel recommendations?

I'm still not sure what is causing this - if this is a driver problem, I would not be the only one experiencing this. I'm not convinced it's not a hardware problem of some sort. Which would really annoy me as this is a brand new motherboard (and new cpu and new power supply...)...
 
Old 10-16-2017, 12:18 PM   #9
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
It looks like Realtek has several years of history with this exact same problem <sigh> and after all these years, it continues to plague them. I'm going to play around with nic drivers a bit and see what happens. I may, just out of desperation, plop in a network card and see if that gets better results...

Realtek released a new driver 30 days ago. I'm going to install this one and see what happens...

Edit: I just picked up an Intel nic, one that does not use realtek chips. I'm betting that this makes the problem go away....

Last edited by Ook; 10-16-2017 at 06:48 PM.
 
Old 10-17-2017, 03:51 AM   #10
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 8,409

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
If 4.4.15 solves it, the one to try is 4.4.38, which is in a slackpkg, perhaps in 14.2. Otherwise stay with 4.4.15. There have been no kernel based security loopholes exposed that I know of recently.
The other thing I would try is slightly lowering the bus speed in the bios, as the realtek chip may not be up to the high speed of a modern motherboard. That's the hardware guy in me speaking, because I have put an oscilloscope on these running and seen the shapes a squarewave can end up in at high speed. Sometimes a pullup or pulldown would solve it, but it's simpler to slow it slightly.
 
Old 10-17-2017, 09:34 AM   #11
polocho
LQ Newbie
 
Registered: Jun 2017
Location: milky way
Distribution: Slackware-current
Posts: 1

Rep: Reputation: Disabled
in my amd board with APU A10 and ethernet built in if I switch off IOMMU the ethernet works ok otherwise I get AMD-Vi event logged and not network
 
Old 10-17-2017, 10:20 AM   #12
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
Quote:
Originally Posted by polocho View Post
in my amd board with APU A10 and ethernet built in if I switch off IOMMU the ethernet works ok otherwise I get AMD-Vi event logged and not network
Interesting - I've read other reports that with IOMMU enabled this happens and one solution is to turn it off. What you describe is exactly what I've been observing. I installed an Intel nic last night, so I'm going to run it a bit and see if the problem persists. So far, every report I've found is with a Realtek chipset, not Intel.
 
Old 10-18-2017, 03:33 PM   #13
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
Now this is *really* interesting! I put an Intel nic in two days ago, and now I'm no longer getting the disconnects. BUT the kernel log is now flooding with these:

[22054.803006] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x000b address=0x0000000007eee000 flags=0x0000]

There is no device 27:00.0. IIRC that was the onboard nic, which I disabled in the bios. The Intel nic is:

27:01.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)

I wonder if disabling it in the bios does not completely shut it down, and the kernel is still somehow detecting it and trying to do something with it?
 
Old 10-18-2017, 03:40 PM   #14
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 530

Original Poster
Rep: Reputation: 89
Quote:
Originally Posted by business_kid View Post
If 4.4.15 solves it, the one to try is 4.4.38, which is in a slackpkg, perhaps in 14.2. Otherwise stay with 4.4.15. There have been no kernel based security loopholes exposed that I know of recently.
The other thing I would try is slightly lowering the bus speed in the bios, as the realtek chip may not be up to the high speed of a modern motherboard. That's the hardware guy in me speaking, because I have put an oscilloscope on these running and seen the shapes a squarewave can end up in at high speed. Sometimes a pullup or pulldown would solve it, but it's simpler to slow it slightly.
Square waves don't exist at high speeds LOL. They are more like sine waves with jaggies all over.

I did poke around in the bios but didn't see any way to adjust this. I figured the driver was being over run - buffer too small or something like that, because it only happens under heavy load. Realtek updated the driver 30 days ago, but I haven't had time to play with it. I needed this fixed, now, and shoving in the Intel nic was quick and easy fix - emphasis on "quick".
 
Old 10-18-2017, 03:52 PM   #15
kjhambrick
Member
 
Registered: Jul 2005
Location: Round Rock, TX
Distribution: Slackware64 14.2 + Multilib
Posts: 993

Rep: Reputation: 439Reputation: 439Reputation: 439Reputation: 439Reputation: 439
Ook --

Does this Ubuntu Thread sound familiar ?

What version of the Realtek Module is loaded ?

I found it on google like this ...

There seem to be a lot of exact hits for that Kernel Message.

Since you're not using the adapter maybe it could be blacklisted or ???

HTH.

-- kjh
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
MVC: Should event handling be in the controller component? rm_-rf_windows Programming 1 10-30-2012 01:56 AM
SDL joystick event from N64 controller adapter not recognized Secant Linux - Hardware 1 05-03-2011 03:03 PM
Problem with ethernet card [Marvel Yukon Ethernet controller] in Fedora 4 me4linux Linux - Hardware 3 03-28-2007 02:39 PM
Starting Ethernet while logged in Panagiotis_IOA Linux - Networking 5 02-24-2006 11:51 AM
iptables and limiting the number of times an event gets logged drexel Linux - Security 3 02-09-2004 08:59 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 01:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration