LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 04-08-2015, 08:13 AM   #1
Brenda Prigg
LQ Newbie
 
Registered: Apr 2015
Posts: 2

Rep: Reputation: Disabled
Uhhuh. NMI received for unknown reason 21 on CPU 0. Do you have strange power saving


During a load and unload of my kernel module, I see the following message:

[ 591.578557] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 591.578557] Do you have a strange power saving mode enabled?
[ 591.578557] Dazed and confused, but trying to continue

I have been trying to debug on my own, to no avail. I am running Centos

[root@bitterroot ~]# uname -a
Linux bitterroot 3.10.0-123.20.1.el7.acpi_debug2.x86_64 #1 SMP Tue Mar
31 09:43:45 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@bitterroot ~]#


I am running a test, where I repeatedly load and unload my kernel
module for a pcie gen3x8 network interface card.

I have no acpi kernel background and want to better understand what might the possible hardware/firmware issues be that would trigger this failure.

I have tried many acpi debug_layer and debug_level flags and
unfortunately do not see any unusual behavior when the test passes versus the debug state of acpi when the failure happens

I do know "Daze and confused happens" as part of loading my module.
I do also notice some time later, as part of unload, I see an lspci
completion timeout occur on the root port where my nic is attached.

I have attached dmesg output, lspci, dsdt.dsl from my system.

I am totally stumped on this one and not sure what I can do as next
debug steps. I am hoping you have some idea on things I can try to
debug further.

https://drive.google.com/folderview?...&usp=drive_web

Thank you in advance.
 
Old 04-08-2015, 08:39 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,148

Rep: Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264
Your card is generating some kind of PCIe bus error which is causing the non-maskable interrupt. The code is not able to classify it into one of the known types, possibly because more than one is occurring. Check the NMI counters. The kernel code which prints the error is here:

http://lxr.free-electrons.com/source...nel/nmi.c#L277
 
Old 04-08-2015, 09:26 AM   #3
Brenda Prigg
LQ Newbie
 
Registered: Apr 2015
Posts: 2

Original Poster
Rep: Reputation: Disabled
Thank you for your response. I will look closer at NMI counters. I should have also mentioned, the Dazed and Confused issue only happens when running Centos. If I run same test with same hardware but distro is Ubuntu, I cannot cause the falure. Here is info on Ubuntu distro I am using

ucrato ~ # uname -a
Linux crato 3.13.0-46-generic #75~precise1-Ubuntu SMP Wed Feb 11 19:21:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
crato ~ #
crato ~ # cat /sys/module/acpi/parameters/acpica_version
20131115


And both hosts are similar hardware platforms, (see below for processor information), I cannot replicate the issue on my on crato host.
(processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping : 4
microcode : 0x427
cpu MHz : 2600.069
cache size : 20480 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16:
 
Old 04-08-2015, 03:16 PM   #4
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,148

Rep: Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264
The difference is likely to be in the kernel code, not userspace. Compare the configs for the two kernels.

Code:
 3.10.0-123.20.1.el7.acpi_debug2.x86_64
 3.13.0-46-generic #75~precise1-Ubuntu
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
NMI received for unkown reason anon091 Linux - Server 7 06-05-2012 11:37 AM
Uhhuh. NMI received for unknown reason 31 on CPU 0. ajay_srivastava Linux - Newbie 2 02-17-2011 02:54 PM
NMI received for unknown reason 31 crackerB Linux - Hardware 1 07-03-2008 11:33 PM
NMI received for unknown reason 31 crackerB Linux - Software 1 07-03-2008 09:07 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 05:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration