LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-01-2010, 04:52 PM   #1
bgjuk
LQ Newbie
 
Registered: Sep 2010
Posts: 1

Rep: Reputation: 0
Northbridge EDAC amd64 problems on HDAMA mobo


Hi all,

I have a 'Rackable Systems' server with an HDAMA mobo - Dual CPU Opteron 250 2.4GHz with 2x memory modules per CPU. It seems to run fine, but it hangs every hour or so!

I am running Ubuntu 64-bot 10.10, which is currently a beta release so I haven't discounted that as the problem yet, but suspect it unlikely. However, I am downloading 10.04 as I type...

dmesg spits out lots of awful messages like these:

[ 1314.920127] EDAC MC1: CE - no information available: amd64_edacError Overflow
[ 1315.920047] Northbridge Error, node 0, core: 0
[ 1315.920060] ECC/ChipKill ECC error.
[ 1315.920066] EDAC amd64 MC0: CE ERROR_ADDRESS= 0x1484410
[ 1315.920082] EDAC MC0: CE page 0x1484, offset 0x410, grain 0, syndrome 0x11c1, row 0, channel 0, label "": amd64_edac

(there are variations on the node, core, address, offset an syndrome etc.)

I have tried swapping CPUs over and running with only CPU.
I have also swapped all the memory around in almost every permutation.

Another worrying symptom is that when I run memtest86+ from a boot disk, it shows zero errors up until the point where the server turns itself off without warning - it hasn't yet completed the test...

If anyone could shed some light on this, I would be grateful. Perhaps I've bought a dodgy second-hand computer, so steep learning curve. But it bugs me not knowing what the root cause is...

Thanks,

Ben
 
Old 10-18-2010, 06:25 AM   #2
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
So memtest never finishes ? I would try removing a RAM stick and seeing if it still crashes, if it does, put it back and take out another one.
 
Old 10-29-2010, 08:18 AM   #3
ortayus
LQ Newbie
 
Registered: Oct 2010
Location: South Bend, IN, USA
Distribution: Red Hat, Ubuntu
Posts: 1

Rep: Reputation: 0
Quote:
EDAC MC0: CE page 0x1484, offset 0x410, grain 0, syndrome 0x11c1, row 0, channel 0, label "": amd64_edac
That is the address of the bad sector of RAM. Different hardware vendors can use this to tell you which DIMM is bad. For example on our old Sun v20z machines this would be DIMM0 on CPU0. However on our Sun x4000 series machines it does not map to the same DIMM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RedHat: AMD64 Northbridge errors srinu Linux - Software 0 08-07-2006 09:41 PM
AMD64/MOBO/Vid Card? DeadPenguin Linux - Hardware 38 05-15-2005 10:43 AM
i386 on AMD64 MOBO. Which Nvidia drivers? Andknig Linux - Newbie 2 04-21-2005 04:22 AM
Best Debian AMD64 Mobo?? nvbauer Debian 1 02-15-2005 08:24 AM
AMD64 and Asus K8V SE deluxe mobo kenji1903 Linux - Hardware 5 07-09-2004 11:45 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 08:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration