Northbridge EDAC amd64 problems on HDAMA mobo
Hi all,
I have a 'Rackable Systems' server with an HDAMA mobo - Dual CPU Opteron 250 2.4GHz with 2x memory modules per CPU. It seems to run fine, but it hangs every hour or so!
I am running Ubuntu 64-bot 10.10, which is currently a beta release so I haven't discounted that as the problem yet, but suspect it unlikely. However, I am downloading 10.04 as I type...
dmesg spits out lots of awful messages like these:
[ 1314.920127] EDAC MC1: CE - no information available: amd64_edacError Overflow
[ 1315.920047] Northbridge Error, node 0, core: 0
[ 1315.920060] ECC/ChipKill ECC error.
[ 1315.920066] EDAC amd64 MC0: CE ERROR_ADDRESS= 0x1484410
[ 1315.920082] EDAC MC0: CE page 0x1484, offset 0x410, grain 0, syndrome 0x11c1, row 0, channel 0, label "": amd64_edac
(there are variations on the node, core, address, offset an syndrome etc.)
I have tried swapping CPUs over and running with only CPU.
I have also swapped all the memory around in almost every permutation.
Another worrying symptom is that when I run memtest86+ from a boot disk, it shows zero errors up until the point where the server turns itself off without warning - it hasn't yet completed the test...
If anyone could shed some light on this, I would be grateful. Perhaps I've bought a dodgy second-hand computer, so steep learning curve. But it bugs me not knowing what the root cause is...
Thanks,
Ben
|