LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   RAM errors (https://www.linuxquestions.org/questions/linux-hardware-18/ram-errors-851160/)

KickMeElmo 12-18-2010 10:41 PM

RAM errors
 
Kubuntu 10.10 froze on me randomly. I shrugged it and rebooted. Then less than an hour later, another freeze. This was rare, so I started up Memtest86+ and let it go. The results... were not encouraging.


Test - Pass - Failing Address - Good - Bad - Err-Bits - Count - Chan
5 - 0 - (0009b39dc28 - 2483.6MB) - fffffeff - fffff6ff - 00000800 - 1 -
5 - 0 - (0009f39dc08 - 2547.6MB) - fffffeff - fffff6ff - 00000800 - 2 -
5 - 40 - (0009ab0dea8 - 2475.0MB) - fbffffff - fbfff7ff - 00000800 - 3 -
5 - 40 - (0009eb0de88 - 2539.0MB) - fbffffff - fbfff7ff - 00000800 - 4 -
5 - 64 - (000a1195ba8 - 2577.5MB) - ffdfffff - ffdff7ff - 00000800 - 5 -
5 - 64 - (000a5195b88 - 2641.5MB) - ffdfffff - ffdff7ff - 00000800 - 6 -
5 - 66 - (000a1b9dca8 - 2587.6MB) - ffffffef - fffff7ef - 00000800 - 7 -
5 - 66 - (000a5b9dc88 - 2651.6MB) - ffffffef - fffff7ef - 00000800 - 8 -
5 - 71 - (000a0195fe8 - 2561.5MB) - ffffdfff - fffdf7ff - 00000800 - 9 -
5 - 71 - (000a4195fc8 - 2625.5MB) - ffffdfff - fffdf7ff - 00000800 - 10 -
5 - 73 - (000a1c121e8 - 2588.0MB) - feffffff - fefff7ff - 00000800 - 11 -
5 - 73 - (000a361a0e8 - 2614.1MB) - feffffff - fefff7ff - 00000800 - 12 -
5 - 73 - (000a5c121c8 - 2652.0MB) - feffffff - fefff7ff - 00000800 - 13 -
5 - 73 - (000a761a0c8 - 2678.1MB) - feffffff - fefff7ff - 00000800 - 14 -
5 - 78 - (000a3195ba8 - 2609.5MB) - ffffffdf - fffff7df - 00000800 - 15 -
5 - 78 - (000a7195b88 - 2673.5MB) - ffffffdf - fffff7df - 00000800 - 16 -
5 - 80 - (000a1d15c68 - 2589.0MB) - fffffbff - fffff3ff - 00000800 - 17 -
5 - 80 - (000a5d15c48 - 2653.0MB) - fffffbff - fffff3ff - 00000800 - 18 -

AMD K10 (65nm) @ 3211 MHz
L1 Cache: 64K 57336 MB/s
L2 Cache: 512K 19112 MB/s
L3 Cache: 6144K 8562 MB/s
Memory : 7934M 4613 MB/s

AMD Phenom II X6 1090T 3.2GHz
DDR3-1333 2x4GB CAS: 9-9-9-24 Non-ECC
Stock voltage on both. Never overclocked.
After pass 0 complete, swapped around various tests and ranges to check.
At pass 38, switched from full-range to test #5 2048M-3327M.
After pass 80, switched to various tests. No tests other than #5 generated errors.


So now my question is multipart...
One, is there anything that can be adjusted within linux itself to avoid triggering these errors (need to keep using this RAM for at least another week before I can afford to RMA it)?
Two, I'm assuming this does indeed point to RAM malfunction/damage, but if I'm wrong, what is going on here?
Three, if there is a way to adjust for these errors, should I still be RMAing it? If I do, I'm going to attempt to get some the same size/speed of ECC RAM. When I ordered originally, I wasn't aware the motherboard I'd picked supported ECC. Now I am.

EDIT
Seems I'm not able to RMA this RAM at present, working with the company I got it from, but that option's just about out. Now, I read something about a kernel module called badram, but I'm finding little detail on how to go about activating it properly. Seems like if I can disable the one chip these errors are all occurring via, that would solve the issue just fine. Any advice would be greatly appreciated, this computer was my early Christmas present, and I'd like to be able to have it for the holiday.

DanceMan 12-19-2010 03:12 AM

Any errors in Memtest usually mean it's NFG, but other factors can cause the ram to error: timings and voltage. You can try upping the voltage a little or loosening the timings or both. If the errors do not go away, then it's toast. Maxing the ram on a motherboard or using more than the usual can make the board need to increase the voltage a little. Give these a try before giving up on it.

KickMeElmo 12-19-2010 07:13 AM

Timings by factory are 9-9-9-20, it's already running at 9-9-9-24, but I'll check it out. I'll adjust them in the morning (when I have time to tweak and observe properly) and see what happens. Not very hopeful considering it's all happening within a single chip's range, but worth a try certainly. If not, I'll give badram a chance, and worst case, I'll buckle down and buy more (or see if somehow I do manage to RMA it). Much appreciated on the response either way, regardless of if it works or not. Anyone else with suggestions, feel free to post them. I'd rather fifty options that don't work than none at all.

KickMeElmo 12-28-2010 11:21 PM

This is one of those moments we all wish would never happen where the savvy computer geek realizes he's made an idiot mistake. And this time, my friends, PEBKAC has proven valid... on the techie.

Bought the RAM, 1.65 volt. Installed the RAM, skimmed over everything to make sure it all looked good. Failed to notice that it was running my RAM at 1.55 volts. Instead all I noticed was that my beautiful 9-9-9-20 RAM was running at 9-9-9-24, and resolved to adjust it after I had ensured everything worked fine.

User error on this one. Thank you all for your time and concern regardless.

J.W. 12-28-2010 11:23 PM

Thanks for posting the followup with the explanation, and congrats on finding the solution.

rentalsolutions 12-28-2010 11:24 PM

Errors in Memtest usually mean it's NFG, but other factors can cause the ram to error: timings and voltage.

archtoad6 12-30-2010 07:37 AM

rentalsolutions,

Thank you for fixing your sig block, your co-operation is really appreciated.

You have ignored [strike][COLOR="Sienna"]2[/COLOR][/strike] at least 4 polite moderator requests:
[url]http://www.linuxquestions.org/questions/user/rentalsolutions-with-archtoad6.html[/url]
[url]http://www.linuxquestions.org/questions/linux-software-2/configure-error-compiler-cannot-create-working-executables-849357/#post4203514[/url]
[url]http://www.linuxquestions.org/questions/linux-general-1/bootable-partition-cannot-be-on-logical-volume-851170/#post4203614[/url]
[url]http://www.linuxquestions.org/questions/linux-newbie-8/location-of-mysql-logs-339976/#post4194627[/url]
to remove the commercial links from your sig block.

You are suspended from LQ for 10 days. PM me to discuss this if you do not understand.

archtoad6 12-30-2010 07:55 AM

KickMeElmo,

My congratulations also on finding a solution, & my compliments on admitting your mistake.
Oh, BTW, thanks for the thoughtful report.


All times are GMT -5. The time now is 12:53 AM.