Hi,
even though I've been using Slackware for quite some time (I installed it for the first time at the beginning of 1994 and used it ever since), this is my first post here and I must admit I was reluctant since I know mine is not a slackware specific problem (probably with slackware there are not distribution specific problems, but just general linux problem..
, but something hardware related, even though there is something I cannot grasp and, maybe, some fellow slackware user may point me to the correct direction...
I recently bought 3 used HP ProBook 654 G1 laptops. Even though the hardware is almost identical, they come in 3 different flavors:
- machine A: AMD A6-4400M, bios version 1.47, Trinity 2 [Radeon HD 7520G]
- machine B: AMD A6-4400M, bios version 1.31, Trinity 2 [Radeon HD 7520G]
- machine C: AMD A6-5350M, bios version 1.31, Richland [Radeon HD 8450G]
I started with machine A and I installed slackware64 current (updated 25/12/2019, kernel 5.4.7), with alienBOB's ktown. Everything worked smoothly and the system was properly configured without any problem.
So I copied the root partition of machine A to machine B and C. Everything seemed to be fine till the first suspend-to-ram/resume cycle:
- at the first resume both machine B and C report an "no irq handler for vector" error:
Code:
[ 247.968483] smpboot: CPU 1 is now offline
[ 247.969211] ACPI: Low-level resume complete
[ 247.969294] ACPI: EC: EC started
[ 247.969294] PM: Restoring platform NVS memory
[ 247.969491] LVT offset 0 assigned for vector 0x400
[ 247.969806] Enabling non-boot CPUs ...
[ 247.969855] x86: Booting SMP configuration:
[ 247.969856] smpboot: Booting Node 0 Processor 1 APIC 0x11
[ 247.970059] microcode: CPU1: patch_level=0x0600111f
-> [ 247.972121] do_IRQ: 1.55 No irq handler for vector
[ 247.972568] CPU1 is up
- at the second resume both machine B and C report two "no irq handler for vector" errors:
Code:
[ 306.514332] smpboot: CPU 1 is now offline
[ 306.514985] ACPI: Low-level resume complete
[ 306.515068] ACPI: EC: EC started
[ 306.515068] PM: Restoring platform NVS memory
[ 306.515264] LVT offset 0 assigned for vector 0x400
-> [ 306.515565] do_IRQ: 0.55 No irq handler for vector
[ 306.515582] Enabling non-boot CPUs ...
[ 306.515630] x86: Booting SMP configuration:
[ 306.515631] smpboot: Booting Node 0 Processor 1 APIC 0x11
[ 306.515834] microcode: CPU1: patch_level=0x0600111f
-> [ 306.517897] do_IRQ: 1.55 No irq handler for vector
[ 306.518371] CPU1 is up
After a deeper inspection I found that all the 3 machines, at boot time, show that same kernel error:
Code:
[ 0.365713] x86: Booting SMP configuration:
[ 0.365779] .... node #0, CPUs: #1
[ 0.011654] do_IRQ: 1.55 No irq handler for vector
[ 0.366755] smp: Brought up 1 node, 2 CPUs
But only machine B and C report the errors when resuming.
I then started searching for the problem and the possible solution, and I found these interesting threads:
https://lkml.org/lkml/2018/2/24/63
https://lkml.org/lkml/2019/2/19/797
The error seems to be related to the BIOS and, as you noticed, machine A is running a more recent version. So I upgraded the BIOS of machine B and C and, you may guess, nothing changed.
I started looking deeper at the differences of these machines: boot logs are the very same till udev is started, then only the ordering of hardware activation changes.
The only difference I can spot between machine A on one side and B and C on the other is in /proc/interrupts:
Machine B and C report:
Code:
$ cat /proc/interrupts
CPU0 CPU1
0: 110 0 IO-APIC 2-edge timer
1: 27055 764 IO-APIC 1-edge i8042
5: 0 0 IO-APIC 5-edge parport0
8: 29 0 IO-APIC 8-edge rtc0
9: 351 153 IO-APIC 9-fasteoi acpi
12: 273531 7213 IO-APIC 12-edge i8042
16: 164694 11894829 IO-APIC 16-fasteoi snd_hda_intel:card1
17: 148 0 IO-APIC 17-fasteoi ehci_hcd:usb5, ehci_hcd:usb6
18: 3729 1046 IO-APIC 18-fasteoi ohci_hcd:usb7, ohci_hcd:usb8, ohci_hcd:usb9
19: 3484 218694 IO-APIC 19-fasteoi b43
23: 2 0 IO-APIC 23-edge lis3lv02d
24: 0 0 PCI-MSI 65536-edge PCIe PME, pciehp
25: 0 0 PCI-MSI 81920-edge PCIe PME, pciehp
26: 0 0 PCI-MSI 114688-edge PCIe PME, pciehp
27: 67755 0 PCI-MSI 278528-edge ahci[0000:00:11.0]
28: 54 21 PCI-MSI 1048576-edge rtsx_pci
29: 0 30 PCI-MSI 18432-edge snd_hda_intel:card0
30: 0 0 PCI-MSI 262144-edge xhci_hcd
31: 0 0 PCI-MSI 262145-edge xhci_hcd
32: 0 0 PCI-MSI 262146-edge xhci_hcd
34: 157 102 PCI-MSI 264192-edge xhci_hcd
35: 0 0 PCI-MSI 264193-edge xhci_hcd
36: 0 0 PCI-MSI 264194-edge xhci_hcd
37: 80438 0 PCI-MSI 16384-edge radeon
NMI: 0 0 Non-maskable interrupts
LOC: 10269185 9992247 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 3199904 2904733 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 16571575 5974276 Rescheduling interrupts
CAL: 37585 40366 Function call interrupts
TLB: 66794 66599 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
DFR: 0 0 Deferred Error APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 54 52 Machine check polls
HYP: 0 0 Hypervisor callback interrupts
HRE: 0 0 Hyper-V reenlightenment interrupts
HVS: 0 0 Hyper-V stimer0 interrupts
ERR: 0
MIS: 0
PIN: 0 0 Posted-interrupt notification event
NPI: 0 0 Nested posted-interrupt event
PIW: 0 0 Posted-interrupt wakeup event
whereas machine A reports:
Code:
# cat /proc/interrupts
CPU0 CPU1
0: 110 0 IO-APIC 2-edge timer
1: 14683 436 IO-APIC 1-edge i8042
5: 0 0 IO-APIC 5-edge parport0
8: 17 0 IO-APIC 8-edge rtc0
9: 6734 151 IO-APIC 9-fasteoi acpi
12: 2087993 5851 IO-APIC 12-edge i8042
16: 10491 19609763 IO-APIC 16-fasteoi snd_hda_intel:card1
17: 532 0 IO-APIC 17-fasteoi ehci_hcd:usb5, ehci_hcd:usb6
18: 2994 296 IO-APIC 18-fasteoi ohci_hcd:usb7, ohci_hcd:usb8, ohci_hcd:usb9
19: 2517 3189613 IO-APIC 19-fasteoi b43
23: 17 0 IO-APIC 23-edge lis3lv02d
24: 0 0 PCI-MSI 65536-edge PCIe PME, pciehp
25: 0 0 PCI-MSI 81920-edge PCIe PME, pciehp
26: 0 0 PCI-MSI 114688-edge PCIe PME, pciehp
27: 1618828 0 PCI-MSI 278528-edge ahci[0000:00:11.0]
28: 166 21 PCI-MSI 1048576-edge rtsx_pci
29: 0 741832 PCI-MSI 524288-edge
30: 0 0 PCI-MSI 262144-edge xhci_hcd
31: 0 0 PCI-MSI 262145-edge xhci_hcd
32: 0 0 PCI-MSI 262146-edge xhci_hcd
33: 69 73 PCI-MSI 264192-edge xhci_hcd
34: 0 0 PCI-MSI 264193-edge xhci_hcd
35: 0 0 PCI-MSI 264194-edge xhci_hcd
36: 0 30 PCI-MSI 18432-edge snd_hda_intel:card0
37: 4907489 0 PCI-MSI 16384-edge radeon
NMI: 0 0 Non-maskable interrupts
LOC: 36867026 40276014 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 21612595 19875197 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 43481724 32182104 Rescheduling interrupts
CAL: 385320 390505 Function call interrupts
TLB: 885376 877701 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
DFR: 0 0 Deferred Error APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 708 704 Machine check polls
HYP: 0 0 Hypervisor callback interrupts
HRE: 0 0 Hyper-V reenlightenment interrupts
HVS: 0 0 Hyper-V stimer0 interrupts
ERR: 0
MIS: 0
PIN: 0 0 Posted-interrupt notification event
NPI: 0 0 Nested posted-interrupt event
PIW: 0 0 Posted-interrupt wakeup event
This is present only in machine A:
Code:
29: 0 741832 PCI-MSI 524288-edge
I don't really know if this is relevant - searching I found that this is usually sent to the amdgpu driver, which is not loaded in my machines.
I've run out of ideas but I keep finding it weird that, if the issue is the BIOS, machine A is not showing the same problem.
The problem is just annoying, to quote Thomas Gleixner, but after linux-4.15 the issue is reported with a pr_emerg_ratelimited, and syslog will broadcast it to every terminal. I'm using many terminals and this problem forced me to change syslog.conf, a solution I don't like. I could try patching the kernel as described here:
https://lkml.org/lkml/2019/3/6/188
but compiling and installing a custom kernel in a slackware-current setup for a laptop is just too much work.
Any idea or direction would be greatly appreciated.
Thanks,
andrea