RedHat Linux memory problem - what needs to be replaced?
Below I have included part of the dmesg output that relates to some hardware problem that I am trying to pin point. The machine is out of maintaince so I need to fix it myself. I am not sure what part to replace. Is it a cpu0 problem or a dimm problem? Any help is greaty welcomed.
proliant dl585
RedHat Enterprise AS rel 3 Taroon update 8
64gb ram ; 4 cpus; 10gb swap
Here is the output of dmesg:
-------------------------------------------------------------
NB error address 00000000002748f0
Jul 4 22:26:18 jack1 kernel: Error chipkill ecc error
Jul 4 22:26:18 jack1 kernel: ECC error syndrome 2242
Jul 4 22:26:18 jack1 kernel: bus error local node response, request didn't time out
Jul 4 22:26:18 jack1 kernel: corrected ecc error
Jul 4 22:26:18 jack1 kernel: previous error lost
Jul 4 22:26:18 jack1 kernel: NB error address 0000000203c74850
Jul 4 22:53:18 jack1 kernel: Error chipkill ecc error
Jul 4 22:53:18 jack1 kernel: ECC error syndrome c8f4
Jul 4 22:53:18 jack1 kernel: bus error local node response, request didn't time out
Jul 4 22:53:18 jack1 kernel: corrected ecc error
Jul 4 22:53:18 jack1 kernel: previous error lost
Jul 4 22:53:18 jack1 kernel: NB error address 00000000002748f0
Jul 5 09:53:16 jack1 kernel: Error chipkill ecc error
Jul 5 09:53:16 jack1 kernel: ECC error syndrome c8f4
Jul 5 09:53:16 jack1 kernel: bus error local node response, request didn't time out
Jul 5 09:53:16 jack1 kernel: corrected ecc error
Jul 5 09:53:16 jack1 kernel: previous error lost
Jul 5 09:53:16 jack1 kernel: NB error address 00000000002748f8
Jul 5 10:27:46 jack1 kernel: Error chipkill ecc error
Jul 5 10:27:46 jack1 kernel: ECC error syndrome c8f4
Jul 5 10:27:46 jack1 kernel: bus error local node response, request didn't time out
Jul 5 10:27:46 jack1 kernel: corrected ecc error
Jul 5 10:27:46 jack1 kernel: previous error lost
Jul 5 10:27:46 jack1 kernel: NB error address 00000000002748f8
Jul 5 13:22:45 jack1 kernel: Error chipkill ecc error
Jul 5 13:22:45 jack1 kernel: ECC error syndrome 11c1
Jul 5 13:22:45 jack1 kernel: bus error local node response, request didn't time out
Jul 5 13:22:45 jack1 kernel: corrected ecc error
Jul 5 13:22:45 jack1 kernel: previous error lost
Jul 5 13:22:45 jack1 kernel: NB error address 00000000001748d0
Jul 5 21:34:14 jack1 kernel: Error chipkill ecc error
Jul 5 21:34:14 jack1 kernel: ECC error syndrome c8f4
Jul 5 21:34:14 jack1 kernel: bus error local node origin, request didn't time out
Jul 5 21:34:14 jack1 kernel: err cpu1
Jul 5 21:34:14 jack1 kernel: corrected ecc error
Jul 5 21:34:14 jack1 kernel: previous error lost
Jul 5 21:34:14 jack1 kernel: NB error address 00000000002748f0
Jul 5 22:05:44 jack1 kernel: Error chipkill ecc error
Jul 5 22:05:44 jack1 kernel: ECC error syndrome c8f4
Jul 5 22:05:44 jack1 kernel: bus error local node response, request didn't time out
Jul 5 22:05:44 jack1 kernel: corrected ecc error
Jul 5 22:05:44 jack1 kernel: previous error lost
Jul 5 22:05:44 jack1 kernel: NB error address 00000000002748f8
Jul 5 22:28:44 jack1 kernel: Error chipkill ecc error
Jul 5 22:28:44 jack1 kernel: ECC error syndrome c8f4
Jul 5 22:28:44 jack1 kernel: bus error local node response, request didn't time out
Jul 5 22:28:44 jack1 kernel: corrected ecc error
Jul 5 22:28:44 jack1 kernel: previous error lost
Jul 5 22:28:44 jack1 kernel: NB error address 00000000002748f8
Last edited by nzmose; 07-29-2008 at 11:26 AM.
|