RedHat Linux memory problem - what needs to be replaced?
Below I have included part of the dmesg output that relates to some hardware problem that I am trying to pin point. The machine is out of maintaince so I need to fix it myself. I am not sure what part to replace. Is it a cpu0 problem or a dimm problem? Any help is greaty welcomed.
proliant dl585 RedHat Enterprise AS rel 3 Taroon update 8 64gb ram ; 4 cpus; 10gb swap Here is the output of dmesg: ------------------------------------------------------------- NB error address 00000000002748f0 Jul 4 22:26:18 jack1 kernel: Error chipkill ecc error Jul 4 22:26:18 jack1 kernel: ECC error syndrome 2242 Jul 4 22:26:18 jack1 kernel: bus error local node response, request didn't time out Jul 4 22:26:18 jack1 kernel: corrected ecc error Jul 4 22:26:18 jack1 kernel: previous error lost Jul 4 22:26:18 jack1 kernel: NB error address 0000000203c74850 Jul 4 22:53:18 jack1 kernel: Error chipkill ecc error Jul 4 22:53:18 jack1 kernel: ECC error syndrome c8f4 Jul 4 22:53:18 jack1 kernel: bus error local node response, request didn't time out Jul 4 22:53:18 jack1 kernel: corrected ecc error Jul 4 22:53:18 jack1 kernel: previous error lost Jul 4 22:53:18 jack1 kernel: NB error address 00000000002748f0 Jul 5 09:53:16 jack1 kernel: Error chipkill ecc error Jul 5 09:53:16 jack1 kernel: ECC error syndrome c8f4 Jul 5 09:53:16 jack1 kernel: bus error local node response, request didn't time out Jul 5 09:53:16 jack1 kernel: corrected ecc error Jul 5 09:53:16 jack1 kernel: previous error lost Jul 5 09:53:16 jack1 kernel: NB error address 00000000002748f8 Jul 5 10:27:46 jack1 kernel: Error chipkill ecc error Jul 5 10:27:46 jack1 kernel: ECC error syndrome c8f4 Jul 5 10:27:46 jack1 kernel: bus error local node response, request didn't time out Jul 5 10:27:46 jack1 kernel: corrected ecc error Jul 5 10:27:46 jack1 kernel: previous error lost Jul 5 10:27:46 jack1 kernel: NB error address 00000000002748f8 Jul 5 13:22:45 jack1 kernel: Error chipkill ecc error Jul 5 13:22:45 jack1 kernel: ECC error syndrome 11c1 Jul 5 13:22:45 jack1 kernel: bus error local node response, request didn't time out Jul 5 13:22:45 jack1 kernel: corrected ecc error Jul 5 13:22:45 jack1 kernel: previous error lost Jul 5 13:22:45 jack1 kernel: NB error address 00000000001748d0 Jul 5 21:34:14 jack1 kernel: Error chipkill ecc error Jul 5 21:34:14 jack1 kernel: ECC error syndrome c8f4 Jul 5 21:34:14 jack1 kernel: bus error local node origin, request didn't time out Jul 5 21:34:14 jack1 kernel: err cpu1 Jul 5 21:34:14 jack1 kernel: corrected ecc error Jul 5 21:34:14 jack1 kernel: previous error lost Jul 5 21:34:14 jack1 kernel: NB error address 00000000002748f0 Jul 5 22:05:44 jack1 kernel: Error chipkill ecc error Jul 5 22:05:44 jack1 kernel: ECC error syndrome c8f4 Jul 5 22:05:44 jack1 kernel: bus error local node response, request didn't time out Jul 5 22:05:44 jack1 kernel: corrected ecc error Jul 5 22:05:44 jack1 kernel: previous error lost Jul 5 22:05:44 jack1 kernel: NB error address 00000000002748f8 Jul 5 22:28:44 jack1 kernel: Error chipkill ecc error Jul 5 22:28:44 jack1 kernel: ECC error syndrome c8f4 Jul 5 22:28:44 jack1 kernel: bus error local node response, request didn't time out Jul 5 22:28:44 jack1 kernel: corrected ecc error Jul 5 22:28:44 jack1 kernel: previous error lost Jul 5 22:28:44 jack1 kernel: NB error address 00000000002748f8 |
Hi.
It's a dodgy stick of RAM. See: http://www.amd.com/us-en/assets/cont...docs/32559.pdf for the details (search for c8f4 in the PDF). Dave |
All times are GMT -5. The time now is 11:53 AM. |