LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 01-15-2020, 04:07 AM   #1
gattocarlo
Member
 
Registered: Jan 2020
Posts: 61

Rep: Reputation: Disabled
"do_IRQ: X.XX no irq handler for vector" kernel error in 2 identical laptops out of 3


Hi,

even though I've been using Slackware for quite some time (I installed it for the first time at the beginning of 1994 and used it ever since), this is my first post here and I must admit I was reluctant since I know mine is not a slackware specific problem (probably with slackware there are not distribution specific problems, but just general linux problem.., but something hardware related, even though there is something I cannot grasp and, maybe, some fellow slackware user may point me to the correct direction...

I recently bought 3 used HP ProBook 654 G1 laptops. Even though the hardware is almost identical, they come in 3 different flavors:

- machine A: AMD A6-4400M, bios version 1.47, Trinity 2 [Radeon HD 7520G]

- machine B: AMD A6-4400M, bios version 1.31, Trinity 2 [Radeon HD 7520G]

- machine C: AMD A6-5350M, bios version 1.31, Richland [Radeon HD 8450G]

I started with machine A and I installed slackware64 current (updated 25/12/2019, kernel 5.4.7), with alienBOB's ktown. Everything worked smoothly and the system was properly configured without any problem.

So I copied the root partition of machine A to machine B and C. Everything seemed to be fine till the first suspend-to-ram/resume cycle:

- at the first resume both machine B and C report an "no irq handler for vector" error:

Code:
    [  247.968483] smpboot: CPU 1 is now offline
    [  247.969211] ACPI: Low-level resume complete
    [  247.969294] ACPI: EC: EC started
    [  247.969294] PM: Restoring platform NVS memory
    [  247.969491] LVT offset 0 assigned for vector 0x400
    [  247.969806] Enabling non-boot CPUs ...
    [  247.969855] x86: Booting SMP configuration:
    [  247.969856] smpboot: Booting Node 0 Processor 1 APIC 0x11
    [  247.970059] microcode: CPU1: patch_level=0x0600111f
->  [  247.972121] do_IRQ: 1.55 No irq handler for vector
    [  247.972568] CPU1 is up
- at the second resume both machine B and C report two "no irq handler for vector" errors:

Code:
    [  306.514332] smpboot: CPU 1 is now offline
    [  306.514985] ACPI: Low-level resume complete
    [  306.515068] ACPI: EC: EC started
    [  306.515068] PM: Restoring platform NVS memory
    [  306.515264] LVT offset 0 assigned for vector 0x400
->  [  306.515565] do_IRQ: 0.55 No irq handler for vector
    [  306.515582] Enabling non-boot CPUs ...
    [  306.515630] x86: Booting SMP configuration:
    [  306.515631] smpboot: Booting Node 0 Processor 1 APIC 0x11
    [  306.515834] microcode: CPU1: patch_level=0x0600111f
->  [  306.517897] do_IRQ: 1.55 No irq handler for vector
    [  306.518371] CPU1 is up
After a deeper inspection I found that all the 3 machines, at boot time, show that same kernel error:

Code:
    [    0.365713] x86: Booting SMP configuration:
    [    0.365779] .... node  #0, CPUs:      #1
    [    0.011654] do_IRQ: 1.55 No irq handler for vector
    [    0.366755] smp: Brought up 1 node, 2 CPUs
But only machine B and C report the errors when resuming.

I then started searching for the problem and the possible solution, and I found these interesting threads:

https://lkml.org/lkml/2018/2/24/63

https://lkml.org/lkml/2019/2/19/797

The error seems to be related to the BIOS and, as you noticed, machine A is running a more recent version. So I upgraded the BIOS of machine B and C and, you may guess, nothing changed.

I started looking deeper at the differences of these machines: boot logs are the very same till udev is started, then only the ordering of hardware activation changes.

The only difference I can spot between machine A on one side and B and C on the other is in /proc/interrupts:

Machine B and C report:

Code:
$ cat /proc/interrupts 
           CPU0       CPU1       
  0:        110          0   IO-APIC   2-edge      timer
  1:      27055        764   IO-APIC   1-edge      i8042
  5:          0          0   IO-APIC   5-edge      parport0
  8:         29          0   IO-APIC   8-edge      rtc0
  9:        351        153   IO-APIC   9-fasteoi   acpi
 12:     273531       7213   IO-APIC  12-edge      i8042
 16:     164694   11894829   IO-APIC  16-fasteoi   snd_hda_intel:card1
 17:        148          0   IO-APIC  17-fasteoi   ehci_hcd:usb5, ehci_hcd:usb6
 18:       3729       1046   IO-APIC  18-fasteoi   ohci_hcd:usb7, ohci_hcd:usb8, ohci_hcd:usb9
 19:       3484     218694   IO-APIC  19-fasteoi   b43
 23:          2          0   IO-APIC  23-edge      lis3lv02d
 24:          0          0   PCI-MSI 65536-edge      PCIe PME, pciehp
 25:          0          0   PCI-MSI 81920-edge      PCIe PME, pciehp
 26:          0          0   PCI-MSI 114688-edge      PCIe PME, pciehp
 27:      67755          0   PCI-MSI 278528-edge      ahci[0000:00:11.0]
 28:         54         21   PCI-MSI 1048576-edge      rtsx_pci
 29:          0         30   PCI-MSI 18432-edge      snd_hda_intel:card0
 30:          0          0   PCI-MSI 262144-edge      xhci_hcd
 31:          0          0   PCI-MSI 262145-edge      xhci_hcd
 32:          0          0   PCI-MSI 262146-edge      xhci_hcd
 34:        157        102   PCI-MSI 264192-edge      xhci_hcd
 35:          0          0   PCI-MSI 264193-edge      xhci_hcd
 36:          0          0   PCI-MSI 264194-edge      xhci_hcd
 37:      80438          0   PCI-MSI 16384-edge      radeon
NMI:          0          0   Non-maskable interrupts
LOC:   10269185    9992247   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:    3199904    2904733   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:   16571575    5974276   Rescheduling interrupts
CAL:      37585      40366   Function call interrupts
TLB:      66794      66599   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
DFR:          0          0   Deferred Error APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         54         52   Machine check polls
HYP:          0          0   Hypervisor callback interrupts
HRE:          0          0   Hyper-V reenlightenment interrupts
HVS:          0          0   Hyper-V stimer0 interrupts
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event
whereas machine A reports:

Code:
# cat /proc/interrupts
           CPU0       CPU1       
  0:        110          0   IO-APIC   2-edge      timer
  1:      14683        436   IO-APIC   1-edge      i8042
  5:          0          0   IO-APIC   5-edge      parport0
  8:         17          0   IO-APIC   8-edge      rtc0
  9:       6734        151   IO-APIC   9-fasteoi   acpi
 12:    2087993       5851   IO-APIC  12-edge      i8042
 16:      10491   19609763   IO-APIC  16-fasteoi   snd_hda_intel:card1
 17:        532          0   IO-APIC  17-fasteoi   ehci_hcd:usb5, ehci_hcd:usb6
 18:       2994        296   IO-APIC  18-fasteoi   ohci_hcd:usb7, ohci_hcd:usb8, ohci_hcd:usb9
 19:       2517    3189613   IO-APIC  19-fasteoi   b43
 23:         17          0   IO-APIC  23-edge      lis3lv02d
 24:          0          0   PCI-MSI 65536-edge      PCIe PME, pciehp
 25:          0          0   PCI-MSI 81920-edge      PCIe PME, pciehp
 26:          0          0   PCI-MSI 114688-edge      PCIe PME, pciehp
 27:    1618828          0   PCI-MSI 278528-edge      ahci[0000:00:11.0]
 28:        166         21   PCI-MSI 1048576-edge      rtsx_pci
 29:          0     741832   PCI-MSI 524288-edge    
 30:          0          0   PCI-MSI 262144-edge      xhci_hcd
 31:          0          0   PCI-MSI 262145-edge      xhci_hcd
 32:          0          0   PCI-MSI 262146-edge      xhci_hcd
 33:         69         73   PCI-MSI 264192-edge      xhci_hcd
 34:          0          0   PCI-MSI 264193-edge      xhci_hcd
 35:          0          0   PCI-MSI 264194-edge      xhci_hcd
 36:          0         30   PCI-MSI 18432-edge      snd_hda_intel:card0
 37:    4907489          0   PCI-MSI 16384-edge      radeon
NMI:          0          0   Non-maskable interrupts
LOC:   36867026   40276014   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:   21612595   19875197   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:   43481724   32182104   Rescheduling interrupts
CAL:     385320     390505   Function call interrupts
TLB:     885376     877701   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
DFR:          0          0   Deferred Error APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        708        704   Machine check polls
HYP:          0          0   Hypervisor callback interrupts
HRE:          0          0   Hyper-V reenlightenment interrupts
HVS:          0          0   Hyper-V stimer0 interrupts
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event
This is present only in machine A:

Code:
 29:          0     741832   PCI-MSI 524288-edge
I don't really know if this is relevant - searching I found that this is usually sent to the amdgpu driver, which is not loaded in my machines.

I've run out of ideas but I keep finding it weird that, if the issue is the BIOS, machine A is not showing the same problem.

The problem is just annoying, to quote Thomas Gleixner, but after linux-4.15 the issue is reported with a pr_emerg_ratelimited, and syslog will broadcast it to every terminal. I'm using many terminals and this problem forced me to change syslog.conf, a solution I don't like. I could try patching the kernel as described here:

https://lkml.org/lkml/2019/3/6/188

but compiling and installing a custom kernel in a slackware-current setup for a laptop is just too much work.

Any idea or direction would be greatly appreciated.

Thanks,
andrea
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
autofs local map files identical but not identical to automount jwaldram Linux - Server 2 10-26-2012 10:35 AM
Make exact copies of dual boot laptop HDDs to identical laptops? DiBosco Linux - Software 2 10-27-2008 05:44 AM
Identical disks that are not identical staphanes Linux - Hardware 8 03-11-2006 11:50 AM
A new kernel is out! A new kernel is out! A new kernel is out! Aussie Linux - General 9 11-29-2002 08:31 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 12:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration