LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   RedHat Linux memory problem - what needs to be replaced? (https://www.linuxquestions.org/questions/linux-hardware-18/redhat-linux-memory-problem-what-needs-to-be-replaced-659039/)

nzmose 07-29-2008 11:24 AM

RedHat Linux memory problem - what needs to be replaced?
 
Below I have included part of the dmesg output that relates to some hardware problem that I am trying to pin point. The machine is out of maintaince so I need to fix it myself. I am not sure what part to replace. Is it a cpu0 problem or a dimm problem? Any help is greaty welcomed.
proliant dl585
RedHat Enterprise AS rel 3 Taroon update 8
64gb ram ; 4 cpus; 10gb swap
Here is the output of dmesg:
-------------------------------------------------------------
NB error address 00000000002748f0
Jul 4 22:26:18 jack1 kernel: Error chipkill ecc error
Jul 4 22:26:18 jack1 kernel: ECC error syndrome 2242
Jul 4 22:26:18 jack1 kernel: bus error local node response, request didn't time out
Jul 4 22:26:18 jack1 kernel: corrected ecc error
Jul 4 22:26:18 jack1 kernel: previous error lost
Jul 4 22:26:18 jack1 kernel: NB error address 0000000203c74850
Jul 4 22:53:18 jack1 kernel: Error chipkill ecc error
Jul 4 22:53:18 jack1 kernel: ECC error syndrome c8f4
Jul 4 22:53:18 jack1 kernel: bus error local node response, request didn't time out
Jul 4 22:53:18 jack1 kernel: corrected ecc error
Jul 4 22:53:18 jack1 kernel: previous error lost
Jul 4 22:53:18 jack1 kernel: NB error address 00000000002748f0
Jul 5 09:53:16 jack1 kernel: Error chipkill ecc error
Jul 5 09:53:16 jack1 kernel: ECC error syndrome c8f4
Jul 5 09:53:16 jack1 kernel: bus error local node response, request didn't time out
Jul 5 09:53:16 jack1 kernel: corrected ecc error
Jul 5 09:53:16 jack1 kernel: previous error lost
Jul 5 09:53:16 jack1 kernel: NB error address 00000000002748f8
Jul 5 10:27:46 jack1 kernel: Error chipkill ecc error
Jul 5 10:27:46 jack1 kernel: ECC error syndrome c8f4
Jul 5 10:27:46 jack1 kernel: bus error local node response, request didn't time out
Jul 5 10:27:46 jack1 kernel: corrected ecc error
Jul 5 10:27:46 jack1 kernel: previous error lost
Jul 5 10:27:46 jack1 kernel: NB error address 00000000002748f8
Jul 5 13:22:45 jack1 kernel: Error chipkill ecc error
Jul 5 13:22:45 jack1 kernel: ECC error syndrome 11c1
Jul 5 13:22:45 jack1 kernel: bus error local node response, request didn't time out
Jul 5 13:22:45 jack1 kernel: corrected ecc error
Jul 5 13:22:45 jack1 kernel: previous error lost
Jul 5 13:22:45 jack1 kernel: NB error address 00000000001748d0
Jul 5 21:34:14 jack1 kernel: Error chipkill ecc error
Jul 5 21:34:14 jack1 kernel: ECC error syndrome c8f4
Jul 5 21:34:14 jack1 kernel: bus error local node origin, request didn't time out
Jul 5 21:34:14 jack1 kernel: err cpu1
Jul 5 21:34:14 jack1 kernel: corrected ecc error
Jul 5 21:34:14 jack1 kernel: previous error lost
Jul 5 21:34:14 jack1 kernel: NB error address 00000000002748f0
Jul 5 22:05:44 jack1 kernel: Error chipkill ecc error
Jul 5 22:05:44 jack1 kernel: ECC error syndrome c8f4
Jul 5 22:05:44 jack1 kernel: bus error local node response, request didn't time out
Jul 5 22:05:44 jack1 kernel: corrected ecc error
Jul 5 22:05:44 jack1 kernel: previous error lost
Jul 5 22:05:44 jack1 kernel: NB error address 00000000002748f8
Jul 5 22:28:44 jack1 kernel: Error chipkill ecc error
Jul 5 22:28:44 jack1 kernel: ECC error syndrome c8f4
Jul 5 22:28:44 jack1 kernel: bus error local node response, request didn't time out
Jul 5 22:28:44 jack1 kernel: corrected ecc error
Jul 5 22:28:44 jack1 kernel: previous error lost
Jul 5 22:28:44 jack1 kernel: NB error address 00000000002748f8

ilikejam 07-29-2008 12:11 PM

Hi.

It's a dodgy stick of RAM. See:
http://www.amd.com/us-en/assets/cont...docs/32559.pdf
for the details (search for c8f4 in the PDF).

Dave


All times are GMT -5. The time now is 11:53 AM.