LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 02-04-2019, 07:45 AM   #1
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Rep: Reputation: Disabled
Shutdown problem - e1000 driver bug? - part II


Hello:

This post is to address a shutdown problem related to the Intel e1000e driver.

This problem is actually the second (unresolved) part of what was originally a two-part problem.

You can read all about it in the OP:
https://www.linuxquestions.org/quest...6/#post5933951

If you want to skip to how the first part of the problem was solved, see here:
https://wiki.archlinux.org/index.php/Wake-on-LAN

I finally opted for the ifup/ifdown script solution in /etc/network/interfaces.

Code:
auto lo
iface lo inet loopback

# added to disable WoL and prevent shutdown bug
# https://serverfault.com/questions/54704/how-to-get-ethtool-wake-on-lan-setting-to-stick
# sets post-up and post-down for eth0
# see https://unix.stackexchange.com/questions/164660/why-are-post-up-commands-in-etc-network-interfaces-ran-multiple-times-at-boot

auto eth0
iface eth0 inet dhcp

    post-up /sbin/ethtool -s eth0 wol d
    post-down /sbin/ethtool -s eth0 wol d
This took care of the WoL related shutdown issue described in the archlinux wiki link above:

Quote:
"Note that some motherboards are affected by a bug that can cause immediate or random #Wake-up after shutdown whenever the BIOS WoL feature is enabled."
The remaining part of the (original two-part) problem boils down to this:

---
On shutdown, the rig will do one of two things:

1. shut down properly
2. freeze during the shutdown at this point ...

Code:
e1000e: EEE Tx LPI Timer
Preparing to enter sleep state S5
Reboot: Power Down
... with the fans blowing at full speed.
---

This happens ocasionally, sometimes it will not happen for a long string of shutdowns but I have not been able to reliably reproduce it.
It happened when I only had wireless access ie: before I had a wired connection and it also happens now.

The issue is obviously related to the Intel e1000e driver (maybe in combination with my dodgy Sun mobo/BIOS) and is distribution agnostic: it happens both with my Devuan ASCII installation and a skeleton TinyCore 8.0 I have on an 1Gb on-board SD Card to access through F8 at boot time as a back-up installation.

Seeing that I have no need for any sleep state in my rig, the obvious solution was to disable the e1000e EEE Tx LPI Timer. Fortunately there's ethtool, which is part of the Devuan ASCII installation which I updated to 4.19, just in case.

https://mirrors.edge.kernel.org/pub/...etwork/ethtool

I also updated the originally Devuan ASCII installed version of the e1000e driver to the latest (3.4.2.1) available one.

I first tried to find the e1000e EEE settings status:

Code:
[root@devuan]# ethtool --show-eee eth0
Cannot get EEE settings: Operation not supported
[root@devuan]#
That did not go too well but just in case, I tried to globally disable EEE:

Code:
[root@devuan groucho]# ethtool --set-eee eth0 eee off
Cannot get EEE settings: Operation not supported
[root@devuan groucho]#
That did not go well either so I tried to disable just the tx-lpi timer:

Code:
[root@devuan groucho]# ethtool --set-eee eth0 tx-lpi off
Cannot get EEE settings: Operation not supported
[root@devuan groucho]#
Seeing I was getting nowhere with the driver settings, I posted the question at Sourceforge ...

https://sourceforge.net/p/e1000/bugs/635/

... and also at the official Intel Ethernet forum ...

https://forums.intel.com/s/question/...language=en_US

The post at the Intel Ethernet forum was an absolute waste of time, this in spite of it being about Intel designed hardware working with a device driver which was specifically written for the Linux kernel by Intel.

After about a month, I got a reply at Sourceforge where, among other things, I was told that "Also, we don't support Debian."

Being this a reply to a question related to a device driver written specifically for the Linux kernel, undoubtedly used by whichever distribution they do support, I found it to be ludicrous at best.

Further research got me some useful information from the maintaners of ethtool (a big thank you here):

Quote:
... looks like SmartPowerDownEnable is only implemented as a module option. It isn't available through ethtool.

You should be able to change the module option settings either through something in /etc/modprobe.d (or similar), or you should be able to
manipulate the settings through /sys/module/e1000e/parameters/.
This seems to be (?) in line with what Intel says of their driver:

[url]https://www.intel.com/content/www/us/en/support/articles/000005480/network-and-i-o/ethernet-products.html/url]

Quote:
The drivers are only supported as a loadable module. We don't supply patches against the kernel source to allow for static linking of the drivers.
After reading intructions in these links ...

http://baruch.siach.name/blog/posts/...le_parameters/
https://access.redhat.com/documentat...ule_parameters
https://www.kernel.org/doc/html/v4.1...arameters.html

... I decided that the safest way was to go with the kernel command line option, lest I screw up with the unloading/reloading of the e1000e driver module, by inserting the stanza e1000e.EEE=0.

Like I mention here ...

https://www.linuxquestions.org/quest...ml#post5954899

... this has not prevented the EEE issue to come up again.

At this point and seeing that ethtool cannot give me the status of the EEE setting in the e1000e driver configuration, I am in need of a way to find out if the stanza added to the kernel command line is actually working as I have found no indication in the log files that it is not.

It would seem that it's the only way I can discard that option and go on to finding out if there is a script somewhere undoing the setting or if this is some other type of problem.

So the first question at hand is:

How can I find which EEE settings the e1000e driver is using once it is up and running, seeing that ethtool will not work?

Thanks in advance,

A.

Last edited by Altoid; 02-04-2019 at 07:46 AM.
 
Old 02-07-2019, 06:42 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 9,917

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
Ar Aighidh Linn (Let's continue), with a fresh mind I hope.

I actually think I found your solution back in post #1 (), where you linked to the bug report. To save time, this is it.
Code:
The situation is identical, suspend doesn't work unless the module is
removed. I would really appreciate it if you could take a whack at
this.
Have you considered a 'rmmod -f' on the appropriate module before suspending? With 25 years at this, I'm sure you'll sort that. That's not a solution, that's a workaround. I think there's provision for a 'rc.local shutdown' script in most distros. I imagine that's easier than debugging and patching the e1000 firmware, which I imagine is the real issue.
 
Old 02-07-2019, 11:00 AM   #3
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:

Glad to see you again. =-)

Quote:
Originally Posted by business_kid View Post
... suspend doesn't work unless the module is removed.
Yes, I linked to that bug report because I guessed that what was causing it was related to what was happening to me.

This guy's machine would not suspend but it was not what I was wanting to do.

I just wanted to have a clean shutdown without rebooting after shutdown or freezing and then requiring a hard shut down.

Like I have mentioned before, the rebooting after shutdown part of the problem was solved by finally finding a way to get rid of the (default) WoL setting in the e1000e configuration (on), something I was not able to do in my BIOS like most rigs can.

At the moment I am assuming that the freezing at shutdown can be solved by disabling EEE in the e1000e driver's configuration.

But I am not 100% sure that the stanza added to the kernel command line at boot is working properly because I have had at least one instance of a freeze at shutdown since I added it.

Like I have mentioned in my previous post:

To make sure the stanza I added to the kernel command line is working, I need to find a way to query the e1000e settings after boot up, something that I am not able to do with ethtool.

As a matter of fact, none of the EEE settings can be modified or queried using ethtool.

Once I find out if the stanza is working, I'll know if I have to either find another way to disable EEE or look for the script or application that is wanting to send my rig to S5, which does not make any sense as I have no settings in place to do anything but shut down.

Quote:
Originally Posted by business_kid View Post
Have you considered a 'rmmod -f' on the appropriate module before suspending?
I am not suspending, I am shutting down.

That said, forcing removal of the e1000e module evey time I shut down would be a bit of a hassle as, like I have mentioned before, the freezing happens every so often and I have not been able to reproduce it.

In any case, the printout at freeze time clearly points to the e1000e module having a part in it.

Quote:
Originally Posted by business_kid View Post
With 25 years at this, I'm sure you'll sort that.
Eventually, I guess.

Quote:
Originally Posted by business_kid View Post
... provision for a 'rc.local shutdown' script in most distros.
Of course.

But it would necessarily involve a script using ethtool which, like I have mentioned, is unable to change the e1000e EEE settings.

While attempting to find a solution to this problem, I have come across a great many posts involving all sorts of problems with the Intel e1000e Linux driver and as Intel, either through their Ethernet forum or Sourceforge have not been able to provide a minimally suitable answer to my support requests, I have partly assumed that this will not get solved and I'll just have to live with it.

Thanks for your input.

Cheers,

A.
 
Old 02-07-2019, 11:17 AM   #4
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 9,917

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
On a separate thread, I have uncovered the intricacies of my own network card and the multiplicity of manufacturer firmware all covered by the same pci id (10ec:8168 in my case).

Put 'rmmod -f <module>' in SOME shutdown script.; Get it out first, and see if that nukes your problem. The bit I posted was reporting a suspend bug.
 
Old 02-07-2019, 11:34 AM   #5
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:
Quote:
Originally Posted by business_kid View Post
Put 'rmmod -f <module>' in SOME shutdown script.
Get it out first, and see if that nukes your problem.
That will surely work: unloading the module will prevent it from invoking anything e1000e related.

eg:
Code:
e1000e: EEE Tx LPI Timer
Preparing to enter sleep state S5
Thanks for your input.

Cheers,

A.
 
Old 02-18-2019, 07:43 AM   #6
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:

It's taken me some time but I have made a bit of headway, at least WoL-wise with respect to the Ultra 24.

Quote:
Originally Posted by business_kid View Post
... don't know your machine, but I'm surprised there's no way to disable wake on lan. How do others with your M/B handle this?
After many hours of searching on the web without finding a solution, I decided to sift through all I had on the Ultra 24, again.

One of the items I went through was the *.iso file with the (last available version) Tools_and_Drivers_1.6.0 DVD.

This time I went through the whole DVD, including the files I had *never* looked at before (Windows related files) as I thought they were not pertinent (still do) basically because I use Linux and not an MS OS.

Half way through this process I came across a *.txt file buried deep in the CD structure:

... /drivers/windows/nic/apps/bootagnt/ibautil.txt

It reads:

********************
"IBAUTIL is a utility program that changes the default settings of your
Intel WfM-compatible adapter.

IBAUTIL can be used to enable or disable the Wake-on-LAN and Boot Agent
capabilities, as well as enable or disable some settings used by the
Boot Agent."


and

"NOTE: Desktop adapters are normally shipped with both WOL and the Boot
Agent enabled."

---
"To enable or disable these features you MUST use IBAUTIL."
(caps are from the the original text, not mine)

********************

Look at that ...

Curiously enough, although the booteable Tools_and_Drivers_1.6.0 DVD can be used to run hardware diagnostics, flash the system BIOS and even "erase the primary boot disk", it does *not* provide an option in the menu to run the ibautil utility to disable WoL or update and/or disable the NIC's Boot Agent.

You can drop to DOS form the DVD once booted but there's no access from there to any of the utilities in the DVD.

ibautil.exe can *only* be run under plain DOS with no memory managers loaded, so I just made a booteable FreeDOS 1.0 USB drive with the content of the /drivers/windows/nic/ folder and ran the utility to disable WoL.

While studying the help file to see how to use the utility, I also found out that the Ultra 24's Intel 82566DM-2 Gigabit on-board controller has no flash memory to hold a boot.rom image, so the Intel Boot Agent is controlled by the system BIOS (LAN Boot) which as I have mentioned in the previous thread *can* be disabled in BIOS.

I guess that this is probably the reason (?) why the ethernet controller itself cannot be disabled in BIOS but I fail to see why this is so.

Anyhow, after a while I remembered ethtool, how the Intel e1000e driver uses it and a thought came to me with respect to the "MUST use IBAUTIL" bit in the last sentence of the previously quoted text:

---
Did this mean that ibautil would actually disable WoL in the EPROM and that ethtool would now return an error if it tried to enable WoL from within the OS?
---

Short answer: no (!)

Even though the Ultra 24 now boots up with WoL disabled (instead of enabled), it can still be enabled via ethtool from within Linux or by *any* application with rights to do so, eg: the Intel e1000e driver which uses ethtool.

Which makes me wonder just what all this "MUST use IBAUTIL" emphasis from Intel in the ibautil.txt file is all about. Surely they know what ethtool is for and what it can do?

At first I thought because the only way to do it from a Windows OS was with ibautil.exe under DOS, but this is not so as it can also be done from somewhere in Control Panel -> System or -> Network (can't recall exactly).

So for the time being I'll have to continue with the lines I added to /etc/network/interfaces with the caveat of it not having been 100% effective as I have had a couple of instances of the reboot at shutdown in spite of their addition.

This makes me think that either there may be a script (?) somewhere turning WoL (maybe right before shutdown) on or that the problem lies elsewhere.

So that's about it.

It seems that newer motherboards (this one is ca. 2007/08) do not need a WoL pin on the PCI bus to enable it so unless there's a motherboard jumper to lock WoL to a disabled state in hardware, this issue does not seem to have a *proper* solution.

See here: Debian Wiki-WakeOnLan

In my opinion, not being able to effectively control WoL status seems like a bit of a security risk. Of course, YMMV.

As for the rest of the problem (EEE settings) afecting the shutdown freeze part of the problem, I think the best and most efficient route will be to see about a script to unload the e1000e module at shutdown as business_kid wisely suggested.

Cheers,

A.

Last edited by Altoid; 02-18-2019 at 07:45 AM.
 
Old 02-18-2019, 10:52 AM   #7
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 9,917

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
That's highly irritating. I would be pretty sure that all that software does is change a "1" to a "0" (or vice versa) at a particular address of some eeprom buried on your m/b. That's the way those things are done; but that knowledge is useless without the particular knowledge held by the manufacturer.

I'm sure the 'MUST use…' is somebody never underestimating the stupidity of his customers, windows users in the main. The funny thing is that a little care would accomodate linux, bsd & MacOS, but nobody wants to go there.

Yep, remove the module. It's the way to go. If you're not using the nic, you can blacklist it.
 
Old 02-18-2019, 11:13 AM   #8
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:
Quote:
Originally Posted by business_kid View Post
That's highly irritating.
Indeed it is. =-/

Quote:
Originally Posted by business_kid View Post
... sure that all that software does is change a "1" to a "0" ...
Makes sense.

But if I can disable Lan Boot in BIOS, why not disable WoL in BIOS by blocking the address where the signal goes?

Quote:
Originally Posted by business_kid View Post
... the 'MUST use…' is somebody never underestimating the stupidity of his customers, windows users in the main.
From what I can recall, every MS OS from W98 onwards till the last one I used (XP) had a way to disable WoL from within the OS.

So there is/was no need for using ibautil.exe under DOS which is why I don't get the emphasised "MUST use" bit.

Quote:
Originally Posted by business_kid View Post
... a little care would accomodate linux, bsd & MacOS, but nobody wants to go there.
Indeed ...
But the WinTel consortium is still very strong.
If you have a look at the thread I started at Intel Ethernet you'll see the lack of the 'little care' in how they deal with Linux tech support, same with Sourceforge.

Quote:
Originally Posted by business_kid View Post
... remove the module.
... If you're not using the nic ...
But I am.

Have to do some more research to put together a script in the right place and not screw up anything.
The e1000e module has a dependency.

Code:
~$ lsmod | grep -i e1000e
e1000e                253952  0
ptp                    20480  1 e1000e
~$
Cheers,

A.

Last edited by Altoid; 02-18-2019 at 11:15 AM.
 
Old 02-18-2019, 02:52 PM   #9
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 9,917

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
Quote:
Originally Posted by Altoid
But if I can disable Lan Boot in BIOS, why not disable WoL in BIOS by blocking the address where the signal goes?
Perhaps the eeprom holding the BIOS has space beyond any expected updates, some of which is assigned to configuration data above or below the BIOS.That's tricky territory; every hacker would love to do what you want to do. That's why you can't. That data is kept in house, coded into binaries, and resistant to analysis. Otherwise, a hacker grabs a bios update, cracks the system, and gets in. I would make triggering a 'write' on that eeprom (i.e. setting /RD high, /Chip Select & /Write Enable low) a very privileged process as unique as possible. And I'd fill every spare bit of that eeprom with bogus data.

They could even have eeprom packaged in the chipset, and clouded behind hardware. Good luck figuring that out!
 
Old 02-18-2019, 04:06 PM   #10
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:
Quote:
Originally Posted by business_kid View Post
Perhaps the eeprom holding the BIOS ...
I understand.

But what I meant was to disable WoL in the same way other motherboards do it: in the BIOS (by whichever method), so that if disabled, ethtool would return an error when attempting to enable it.

Which is probably (?) what happens in your motherboard if you disable WoL at BIOS level and then try to enable it via Linux using ethtool or any other software.

Something like this, I suppose (made it up):

Code:
# ethtool -s eth0 wol g
Cannot get wol settings: Operation not supported    <--- because it is disabled at BIOS level.
#
Can't be that difficult to implement by the OEM.
If it has not been done, it's not been done purposedly.

As things stand, any application with access to ethtool (the e1000e driver or xfce4-power-manager, for example) can set WoL to 'enabled' and unless I physically disconnect the NIC from the network when I shutdown, having set it to 'disabled' really has no practical effect.

Thanks for your input.

Cheers,

A.
 
Old 02-19-2019, 04:32 AM   #11
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 9,917

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
Ok. Get that. It's not as simple as it seems

/Guessing
WOL is the default. Even the bios has a 'return to defaults' option, and apparently there's a 'return eth0 to defaults' option - probably copying one BIOS address to another. Setting individual bits is obviously different.

I would personally go about this another way.
* Continue removing the module before shutdown
* Try to load something into the Wol rom. No doubt something will occur to you, or you may find a download on somebody's site.
 
Old 02-19-2019, 10:19 AM   #12
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:

Quote:
Originally Posted by business_kid View Post
WOL is the default.
Yes, same as EEE=1 by default.
That's how Intel sets their drivers up.

https://access.redhat.com/documentat...cient_Ethernet

Don't know if any other ethernet chipset designers/manufacturers do the same.

Quote:
Originally Posted by business_kid View Post
* Continue removing the module before shutdown
I have not done that just yet.
I'm waiting to see if after blacklisting a module it (ie: the shutdown problem) happens again.

Background to this:

With all my Linux installations, dmesg has consistently printed out these lines, always.

1.
[ 0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20160831/tbfadt-603)

2.
[ 0.190530] ACPI Error: [SUPP] Namespace lookup failure, AE_NOT_FOUND (20160831/psargs-359)
[ 0.190540] ACPI Error: Method parse/execution failed [\_SB._OSC] (Node ffff8d18b31b8000), AE_NOT_FOUND 0160831/psparse-543)

3.
[ 0.194534] acpi PNP0A08:00: ignoring host bridge window [mem 0x000d0000-0x000dffff window] (conflicts with Adapter ROM [mem 0x000ce000-0x000d3bff])

4.
[ 0.201443] pci 0000:00:1f.0: quirk: [io 0x0800-0x087f] claimed by ICH6 ACPI/GPIO/TCO
[ 0.201452] pci 0000:00:1f.0: quirk: [io 0x0480-0x04bf] claimed by ICH6 GPIO

5.
[ 0.205498] Expanded resource reserved due to conflict with PCI Bus 0000:00

6.
[22.381293] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[22.394673] lpc_ich: Resource conflict(s) found affecting gpio_ich

For a while I looked all over and found only advice to ignore them as a benign warnings.
So after seeing the same answer countless times, I ignored it in spite of this, specific to #4:

https://bugs.launchpad.net/ubuntu/+s...x/+bug/1666650 Comment #3

Quote:
"This is a warning of a problem with your machine's firmware. Unless you are
seeing problems you believe are related to this message it is safe to ignore
it."
After all, it happened with every distribution I had tried and a problem with my Ultra 24's firmware wasn't news to me, but as I was not able to link it to anything specific, I let it be.

Earlier on, while trying to find a solution to this issue, I decided to have another look at that lpc_ich bit and came upon a post with a different answer.

It said that blacklisting the lpc_ich, gpio_ich and pcspkr modules was the solution to the problem.

https://bbs.archlinux.org/viewtopic....57602#p1357602

It turns out that lpc_ich is the interface to gpio_ich ...

https://github.com/torvalds/linux/bl.../mfd/lpc_ich.c

* lpc_ich.c - LPC interface for Intel ICH
* LPC bridge function of the Intel ICH contains many other
* functional units, such as Interrupt controllers, Timers,
* Power Management, System Management, GPIO, RTC, and LPC
* Configuration Registers.

Code:
$ sudo modinfo lpc_ich
filename:       /lib/modules/4.9.0-8-amd64/kernel/drivers/mfd/lpc_ich.ko
license:        GPL
description:    LPC interface for Intel ICH
$
... and it's related to Power Management.

On a whim, I first blacklisted the three modules and with some testing saw that blacklisting lpc_ich was enough solve the resource conflict and now #6 is gone from dmesg.

This was around 10/12 days ago and I have not (yet) had another instance of the shutdown problem, which does not mean it will not surface again.

As you know, it is unpredictable and I've not been able to reproduce it, so it's just a matter of seeing if (maybe in a month or so) it does not come up again.

Quote:
Originally Posted by business_kid View Post
* Try to load something into the Wol rom.
Hmm ...
No, I don't think I want to go there.

I don't have the required knowledge or experience save almost bricking a motherboard or a PCI card on more than one ocassion, saved by the experience many people on the web with the same problem.

There's precious little data/information/documentation on the Ultra 24 out there and what is available is poorly documented or just basic stuff.

Oracle won't be of any help if I brick something and Intel will say that their controller is not supported anymore.

So I'll (try to) stay away from any experiments. =-)

Thanks for your input.

Cheers,

A.
 
Old 02-24-2019, 04:19 PM   #13
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:

Update

Quote:
Originally Posted by Altoid View Post
... around 10/12 days ago and I have not (yet) had another instance ...
... not been able to reproduce it, so it's just a matter of seeing if (maybe in a month or so) ...
But it did.

I have been able to confirm that there are three ways to (try to, more later) disable EEE in the e1000e driver in spite of ethtool not being able to access any of the ethernet controller's EEE registers to read or change them.

1.
With modprobe, by first unloading the e1000e module, then reloading it and finally updating init:

Code:
# modprobe -v -r e1000e
rmmod e1000e
rmmod ptp
rmmod pps_core
#
Code:
# modprobe -v e1000e EEE=0
insmod /lib/modules/4.9.0-8-amd64/kernel/drivers/pps/pps_core.ko 
insmod /lib/modules/4.9.0-8-amd64/kernel/drivers/ptp/ptp.ko 
insmod /lib/modules/4.9.0-8-amd64/updates/drivers/net/ethernet/intel/e1000e/e1000e.ko EEE=0 
#
Code:
# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.9.0-8-amd64
live-boot: core filesystems devices utils udev wget blockdev dns.
#
2.
By inserting this stanza in the kernel command line: "e1000e.EEE=0".

3.
By editing the /etc/modprobe.d/e1000e.conf file and adding the line "options e1000e EEE=0".

Code:
~$ cat /etc/modprobe.d/e1000e.conf
options e1000e EEE=0
~$
I came across the confirmation mentioned above quite by chance while practising how to do the unload/load operation without mucking up anything else.

I had already inserted the stanza in the command line and edited the e1000e.conf file (this last I had forgotten about) so when I unloaded/reloaded the driver module I noticed that EEE=0 appeared (twice) when loaded, albeit without any added parameter ...

Code:
# modprobe -v e1000e
insmod /lib/modules/4.9.0-8-amd64/kernel/drivers/pps/pps_core.ko 
insmod /lib/modules/4.9.0-8-amd64/kernel/drivers/ptp/ptp.ko 
insmod /lib/modules/4.9.0-8-amd64/updates/drivers/net/ethernet/intel/e1000e/e1000e.ko EEE=0 EEE=0 
#
... which could only mean that EEE=0 was being set in the kernel command line and somewhere else.

I remmed the line in e1000e.conf and edited the kernel command line and upon loading and reloading the module as I did above, the EEE=0 parameter was not there at all.

This meant that only the command line stanza and the e1000e.conf setting survive a module removal.
ie: if you do not reload the module with EEE=0 it will not be set unless one of them is present.

With no errors in /var/log/boot or /var/log/kernel.log, I guess that it is safe to assume that the parameter is correctly set.

If not, you get something like this in dmesg:

Code:
 
[ 1.102821] e1000e: unknown parameter 'tx-lpi' ignored
Of course that does not mean that disabling EEE with the EEE=0 parameter actually works, because it seems it does not.

After a day or two I had yet another instance of the shutdown freeze and to my surprise, the screen printout read the same as when EEE was not disabled.
ie:

Code:
[485.781519] e1000e: EEE TX LPI TIMER: 00000000
[485.785219] ACPI: Preparing to enter sleep state S5
[485.868007] reboot: Power down
Unless I am mistaken, it seems that the EEE=0 parameter does not disable EEE.

As you can see from the output of the ethtool help screen, EEE has other sub-parameters which can apparently be selectively set.

I'd say that it is logical to assume that disabling EEE disables the whole sub-set as all the other parameters are set through the "--set-eee eth0" instruction.

Code:
ethtool --set-eee DEVNAME	 Set EEE settings
		[ eee on|off ]
		[ advertise %x ]
		[ tx-lpi on|off ]
		[ tx-timer %d ]
But the tx-lpi timer was obviously still enabled.

So I went about writing a script to unload the module at shutdown, as business_kid suggested ...

Code:
#!/bin/sh
# Remove the e1000e driver
# Shutdown system without the use of shutdown-helper 

sudo modprobe -v -r e1000e && sudo shutdown -h now
... and made a *.desktop file for it.

Code:
[Desktop Entry]
Type=Application
Encoding=UTF-8
Name=shutdown
Comment=Shuts down system bypassing shutdown helper
Exec=xfce4-terminal -x /usr/bin/shutdown.sh
Icon=/usr/share/icons/gnome/32x32/actions/gnome-shutdown.png
Terminal=false
Path=
StartupNotify=false
GenericName=Shutdown
But to no use.

Much to my chagrin and sooner that I expected, I had another instance of the freeze.
But this time without any indication that the e1000e module had anything to do with it.

Code:
[174.608278] sd 8:0:3:0: [sde] Synchronizing SCSI cache
[174.608642] sd 8:0:2:0: [sdd] Synchronizing SCSI cache
[174.617504] ACPI: Preparing to enter sleep state S5
[174.680008] reboot: Power down
So unless I am much mistaken, this means that the e1000e driver module is not related to the shutdown problem.

I'll close this thread here and see about opening another one about just the shutdown issue without involving the e1000e driver.

Cheers and thanks,

A.
 
Old 02-25-2019, 08:11 AM   #14
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 9,917

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
Instead of 'rmmod' use 'rmmod -f' which should disable sanity checks. If you start another thread on this, for sure I will not post on it. You have already given this a disproportionate amount of effort. It's doubtful if you're going to solve or mitigate this. I personally would not have invested such efforts to save a server Blacklist the module and buy a nic, motherboard, or whatever, or upgrade.

We think of a network chip there, but it's some tiny fraction of a massive ASIC where weird and wonderful faults, crossovers, or leakages are possible, with effects that can only be observed, not predicted.

Let me give you an example from my hardware R&D days. In a 3 phase thyristor inverter, this circuit handled overload. It randomly blew fuses on overload, and I was tasked to isolate the faulty component. After 5 hours, with a twin channel 'scope I had isolated the particular stage, involving 3 triple input nand gates. I sought to replace the three, but in a piece of management idiocy only equalled by the Charge of the Light Brigade, I was told to isolate the one. I needed a 4 channel storage 'scope, and two more days of work overloading and examining traces to isolate which gate was in fact acting as a majority gate (These were days before chip testers).

Today, that entire circuit would be one tiny part on an FPGA/ASIC. What chance have you got? If the outside doesn't work the way it should, you're knackered. Unless every board has the fault, it's your hardware. If everyone does have it, it's their hardware.
 
Old 02-25-2019, 09:01 AM   #15
Altoid
Member
 
Registered: Oct 2016
Location: Southern Hemisphere
Distribution: Devuan
Posts: 106

Original Poster
Rep: Reputation: Disabled
Hello:

Quote:
Originally Posted by business_kid View Post
... use 'rmmod -f' which should disable sanity checks.
I had looked at rmmod but eventually went for modprobe as sugested by man rmmod:

Code:
DESCRIPTION
... a trivial program to remove a module ...
Most users will want to use modprobe(8) with the -r option instead.

OPTIONS
-f, --force
This option can be extremely dangerous: it has no effect unless CONFIG_MODULE_FORCE_UNLOAD was set ...
I think I fall into the 'most users' category and extremely dangerous is probably there to make a point.

Quote:
Originally Posted by business_kid View Post
... another thread on this, for sure I will not post on it.
You've already done a lot.
I had no expectations of you posting anything else with respect to the e1000e module or the shutdown freeze.

Quote:
Originally Posted by business_kid View Post
... given this a disproportionate amount of effort.
Well, I'm retired and I have time to do it.

And I think that looking for a solution is also a way to learn new things.
In this case quite a bit.

About ACPI, Intel's surprisingly deficient tech support, the Intel e1000e Ethernet drivers and how they work/do not work, the fact that unloading the module does seem to solve the shutdown issue among other things.

Quote:
Originally Posted by business_kid View Post
It's doubtful if you're going to solve or mitigate this.
Could be ...

Quote:
Originally Posted by business_kid View Post
... buy a nic, motherboard, or whatever, or upgrade.
Not what I usually do.
If it was company/ministry hardware I would have already done that.

But it is not and I still think there may be a solution.
Like I said, I'm retired, I have time to do it and I find it is a challenge of sorts.

I may eventually try to learn how to compile a specific kernel to be able to load a custom DSDT and see if that is really the issue.

Quote:
Originally Posted by business_kid View Post
Unless every board has the fault, it's your hardware. If everyone does have it ...
I'm quite sure that every Ultra 24 board has this same problem if running Linux or an MS OS.

In spite of being x86, it came with Sun's Solaris OS pre-installed, which most (?) probably handled these BIOS/hardware issues in a different manner. It was expensive and oriented to a specific end-user segment so it is quite likely not many users switched to Linux or a MS OS.

That and not a (comparatively) significant number sold probably made for a small user base, leading to practically no posts on it on the Sun user blogs or the open web.

If that were not enough, all the existing Sun user blogs were gobbled up by Oracle and have been unavailable since, so information for Sun hardware issues is very scarce or directly inexistent.

In any case, thanks a lot for your input.

Cheers,

A.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shutdown problem - e1000 driver bug? Altoid Linux - Hardware 35 01-29-2019 05:39 AM
need insmod to load e1000 driver on boot pyroman59 Linux - Networking 4 07-25-2005 12:50 PM
driver problem with intel pro e1000 takatam Linux - Hardware 3 01-20-2005 01:58 AM
Fedora, e1000 driver, 6 ethernet ports pim42 Linux - Hardware 0 06-03-2004 05:02 PM
Backport e1000 driver mikewelter Linux - Hardware 1 05-14-2004 11:26 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 05:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration