RedHat: AMD64 Northbridge errors
We have some issue with Northbridge errors in the server when running dmesg command. We contacted the Redhat and Sun about this error. They are advising to change the in BIOS and some others tellings its memory issue need to replace memory and some others tellings kernal need to upgrade
According to sun its known non critical error can resolve this error with the following steps.
Northbridge status error
Resolution:The northbridge status error numbers may change slightly however this resolution applies for 64-bit architectures. This symptom has been seen in Red Hat Enterprise Linux 3 and 4.
Symptom:
Translation Look-Aside Buffer (TLB) reload causes errors with certain Linux software. In the BIOS Advanced menu, there is an option named "No Spec. TLB Reload." By default, this setting is disabled and allows TLB reload. With this default setting, errors similar to the following have been observed on systems running any 64-bit version of Red Hat Linux:
Northbridge status a60000010005001b
GART error 11
Lost an northbridge error
NB status: unrecoverable
NB error address 0000000037ff07f8
Error uncorrected
Solution:
To avoid these errors, disallow TLB reloading through the following steps:
Reboot the server and press the key to enter the BIOS setup. This key varies from BIOS to BIOS and maybe seen when the machine first boots up.
Navigate to the Advanced > Chipset Configuration BIOS menu.
Use the arrow keys to scroll down to the option "No Spec. TLB reload" and change its setting from Disabled to Enabled.
This will disallow TLB reloading and avoid the error message.
================================================================================
The following is the information of our server
================================================================================
Operating System:
Linux eda1 2.4.21-37.ELsmp #1 SMP Wed Sep 7 13:32:18 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
Error Messages: (dmesg)
CPU 0: Silent Northbridge MCE
Northbridge status a6000001:0005001b
Error gart error
GART TLB error generic level generic
err cpu1
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20030
nfs: server edaserver2 not responding, still trying
nfs: server edaserver2 OK
CPU 0: Silent Northbridge MCE
Northbridge status a6000001:0005001b
Error gart error
GART TLB error generic level generic
err cpu1
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20040
CPU 0: Silent Northbridge MCE
Northbridge status a6000001:0005001b
Error gart error
GART TLB error generic level generic
err cpu1
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20010
CPU 2: Silent Northbridge MCE
Northbridge status a6000002:0005001b
Error gart error
GART TLB error generic level generic err cpu0
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20000
CPU 0: Silent Northbridge MCE
Northbridge status a6000001:0005001b
Error gart error
GART TLB error generic level generic
err cpu1
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20058
CPU 0: Silent Northbridge MCE
Northbridge status a6000002:0005001b
Error gart error
GART TLB error generic level generic
err cpu0
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20038
CPU 2: Silent Northbridge MCE
Northbridge status a6000001:0005001b
Error gart error
GART TLB error generic level generic
err cpu1
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20040
CPU 0: Silent Northbridge MCE
Northbridge status a6000001:0005001b
Error gart error
GART TLB error generic level generic
err cpu1
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20020
CPU 1: Silent Northbridge MCE
Northbridge status a6000001:0005001b
Error gart error
GART TLB error generic level generic
err cpu1
processor context corrupt
error uncorrected
previous error lost
NB error address 0000000037f20050
Processors :
AMD opetron 8 processors
Memory:
16GB
We want exact resolution for this error, its critical producation server. Can any one of manager advice to resolve this error?
Thanks and Regards
Srinivas
Singapore
|