Subject:
kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
Findings:
I spend nearly the whole day yesterday investigating this error. This is a bug in the current Kernel Version.
(Linux version 2.6.18-92.1.22.0.1.el5 (mockbuild@ca-build9.us.oracle.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Tue Dec 16 16:54:25 EST 2008)
This bug is currently under investigation by Red hat Bugzilla.
Breakdown of Error:
error code is EDAC i5000 MC0
Edac is a memory stat tool in the kernel monitoring the RAM Memory for ECC memory
i5000 is the chipset memory controller for Intel
Bios has its own memory stat tool. At the moment the EDAC stat tool is conflicting with the Bios monitor.
What we done:
We done a memtest over 6 hr and it was successful
We've tested the memory modules by removing pairs at a time
Then we put the original memory back in. (Spare memory)
HP replace motherboard
Still the message appear
Solution:
As the EDAC is only a memory Stat tool for the kernel and it does not have any impact on the OS or the server(none critical). We can blacklist (stop the error message for popping up) it until the next kernel release when this bug should be fixed.
Bios are already monitoring the memory via Bios Any memory failure or thermal event will be reported
The workaround for this problem is to prevent the i5000_edac module from loading. To do this, add the following line to the /etc/modprobe.d/blacklist file then reboot server boot.
Few Links
http://webui.sourcelabs.com/rhel/issues/458133
http://forums.oracle.com/forums/thre...90202&tstart=0
http://www.nikhef.nl/pub/projects/gr...ry&redirect=no
http://www.graystorm.com/wordpress/?p=451