LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Tmid Thermal event with intelligent throttling disabled (https://www.linuxquestions.org/questions/linux-newbie-8/tmid-thermal-event-with-intelligent-throttling-disabled-721096/)

schs777 04-23-2009 03:37 AM

Tmid Thermal event with intelligent throttling disabled
 
Hi all

We have one Oracle unbreakable Linux installed on a HP DL380 G5. Since the install we've been getting error message Tmid Thermal event with intelligent throttling disabled.

We logged it with HP thinking it could be the memory, HP replaced the motherboard. Yet the message has not cleared. The server has 32GB memory. We removed a pair of memory en tested the server again. still the message appear. We repeated the process with all the modules. We then replaced it with new spare 2gb memory and still the message came back. Then we did a memtest still no luck.

I am now starting to wonder if the problem does not lay with the Kernel. (it seems to happen when our Oracle DBA are importing migrated tables and data)

Any ideas or steps to do more test or fix this issue. Not sure if we want to disable the message of showing.

Here is some info of the server, memory, processors, kernel version and the error message:

Linux version 2.6.18-92.1.22.0.1.el5 (mockbuild@ca-build9.us.oracle.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Tue Dec 16 16:54:25 EST 2008


arch
x86_64

meminfo

MemTotal: 32831032 kB
MemFree: 325276 kB
Buffers: 699700 kB
Cached: 30022748 kB
SwapCached: 90040 kB
Active: 17000236 kB
Inactive: 14887732 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 32831032 kB
LowFree: 325276 kB
SwapTotal: 8385920 kB
SwapFree: 8295880 kB
Dirty: 600 kB
Writeback: 0 kB
AnonPages: 1075372 kB
Mapped: 3769940 kB
Slab: 401696 kB
PageTables: 174980 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 24801436 kB
Committed_AS: 7798452 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 267100 kB
VmallocChunk: 34359470055 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB

free -m
total used free shared buffers cached
Mem: 32061 31738 323 0 684 29326
-/+ buffers/cache: 1727 30333
Swap: 8189 87 8101

cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5260 @ 3.33GHz
stepping : 6
cpu MHz : 3333.342
cache size : 6144 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 6671.14
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5260 @ 3.33GHz
stepping : 6
cpu MHz : 3333.342
cache size : 6144 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 6666.59
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

cat /var/log/messages | grep -i thermal
Apr 22 13:46:28 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Apr 22 13:46:29 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Apr 22 13:46:31 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Apr 22 13:46:32 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Apr 22 13:46:38 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Apr 22 13:46:41 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Apr 22 13:46:42 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Apr 22 13:46:43 kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled

schs777 04-24-2009 04:24 AM

Subject:
kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled

Findings:
I spend nearly the whole day yesterday investigating this error. This is a bug in the current Kernel Version.

(Linux version 2.6.18-92.1.22.0.1.el5 (mockbuild@ca-build9.us.oracle.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Tue Dec 16 16:54:25 EST 2008)

This bug is currently under investigation by Red hat Bugzilla.

Breakdown of Error:
error code is EDAC i5000 MC0
Edac is a memory stat tool in the kernel monitoring the RAM Memory for ECC memory
i5000 is the chipset memory controller for Intel

Bios has its own memory stat tool. At the moment the EDAC stat tool is conflicting with the Bios monitor.

What we done:
We done a memtest over 6 hr and it was successful
We've tested the memory modules by removing pairs at a time
Then we put the original memory back in. (Spare memory)
HP replace motherboard
Still the message appear

Solution:
As the EDAC is only a memory Stat tool for the kernel and it does not have any impact on the OS or the server(none critical). We can blacklist (stop the error message for popping up) it until the next kernel release when this bug should be fixed.

Bios are already monitoring the memory via Bios Any memory failure or thermal event will be reported

The workaround for this problem is to prevent the i5000_edac module from loading. To do this, add the following line to the /etc/modprobe.d/blacklist file then reboot server boot.

Few Links
http://webui.sourcelabs.com/rhel/issues/458133
http://forums.oracle.com/forums/thre...90202&tstart=0
http://www.nikhef.nl/pub/projects/gr...ry&redirect=no
http://www.graystorm.com/wordpress/?p=451


All times are GMT -5. The time now is 08:08 PM.