LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 10-20-2005, 07:44 AM   #1
Artanicus
Member
 
Registered: Jan 2005
Location: Finland
Distribution: Ubuntu, Debian, Gentoo, Slackware
Posts: 827

Rep: Reputation: 31
computer dying randomly, can't figure out memtest results


Been having this problem for about a month now, it started out of the blue at first, and now occurs almost randomly, but usually _after_ doing something memory intensive.
Got an epox motherboard with nforce2 chipset (8RDA3G), an nvidia vid.card (GeForce FX 5200/AGP) and two kingston DDR 512 meg ram modules.

Crash description:
Screen goes from green signal light to amber, the same as for monitor poweroff via power management etc, both CPU and disk activity lights are constantly on tho no disk seek is heard, motherboard errordisplay (a small led panel) reads FF (which means something like general error, it can mean anything from wrongly inserted module to generic boot error.) Keyboard doesnt answer, the screen is blank, nothing is happening, a total halt. Cant even get in thru ssh. Reset button wont work (normally works, but not after this halt) so I have to manually power down and boot again.

Last time when I returned to the video clip I was watching (that triggered the last crash, before that it was warcraft a few times, before that browsing with Opera.. d after the restart, it crashed immediately again (usually takes a few days to happen again) so im guessing the clip was still in the memory area, possibly a faulty area.

So, suspecting the ram as I checked all internal connections, cooling etc etc (no overclocks ever been done on this machine).. So, ran memtest86 one night, with 9 errors total. I just cant decipher whether to worry or not about the results as they are quite baffling for me to understand.
Heres a "screenshot": http://img456.imageshack.us/my.php?i...74small2gu.jpg


My questions are:
1.) Can you tell me anything about what might be wrong here, and causing the halt? Im guessing its one of motherboard, vid.card or ram, but dont have parts to swap so cant realy say.

2.) When is the time to worry about errors that memtest finds? If they are found multiple times (only 2 sectors found twice, rest only once)? Or if theres any error at all?

Its a 100 euro bill to replace the ram so id love to be more sure about this before investing.. /: Thanks for any help you can give.. Ill be doing swap memtests for the ram modules a few days from now prolly, so maybe see if its just one module or both thats hosed / possibly hosed..
 
Old 10-20-2005, 07:57 AM   #2
RedShirt
Senior Member
 
Registered: Oct 2005
Location: Denver
Distribution: Sabayon 3.5Loop2
Posts: 1,150

Rep: Reputation: 45
If memtest gets a single error, its time to worry about your ram. It may not mean you have real issues though, you need to do some testing.

Test each stick individually, see if only one is getting errors, or if both are.

If neither do, test them as a pair like you had them again, if you get errors then, you are assured it isn't bad ram, just a bad setup.

So then, are you dual channeling them? Try disabling that in the bios and test again.
Assuming you get no errors after disabling the dual channel, you can go In the bios set the ram speed hard to the speed on the ram, do the same with the CAS, don't use auto or optimal, test again, if that is fine, renable your dual channel, still with the sticks hard set to match speed and CAS. If the ram itself isn't bad but you get errors while dual channeling, you have a timing issue. So you can leave it off, or get a better suited set of sticks to dual chnnel.

Good luck.
 
Old 10-20-2005, 07:59 AM   #3
ralvez
Member
 
Registered: Oct 2003
Location: Canada
Distribution: ArchLinux && Slackware 10.1
Posts: 298

Rep: Reputation: 30
In my experience it could be almost anything (as per the description of the problem) but in my experience it seems to resemble the behavior of a) a HDD about to go or b) the HDD controller (on the motherboard) about to go. In both cases the system goes "frozen" and the light of the HDD is on as if the disk is being read but nothing happens.
I would back up my files and get ready.

Hope this helps.

Rick
 
Old 10-20-2005, 08:09 AM   #4
Artanicus
Member
 
Registered: Jan 2005
Location: Finland
Distribution: Ubuntu, Debian, Gentoo, Slackware
Posts: 827

Original Poster
Rep: Reputation: 31
thanks for the quick replies..

Redshirt: I will do those individual tests tonight. I doubt the error being in the dual channeling (tho ia barely know what it is anyways) since I havnt touched anything related to ram in the last year, and the comp has been running with this hardware for over a year now. Last good uptime was 48 days before the first halt and would have been much longer if it were not for moving appartments..

ralvez: I at first had the same idea too.. Ive got two maxter HDD:s so I emptied a 11G partition from my secondary disk, installed SuSE 9.3 on it and been running that since, and the situation hasnt improved. I do have the other disk mounted, but only for data, nothing is actually used from there actively, and specially not when hte halts have occured. And, with my normal Slackware system on the other disk, vice versa the smaller disk was barely used, so I think I can rule out a HDD failure unless they both are simulatanously failing suddenly.

Tho, I do rmemeber seeing errors upon powerdown that suse could not unmount the root partition (11G reiserfs) at all so it just powered down anyways.. d:

The last possibility would be that the vid.card is hosed or its memory is hosed. The screen blanking / MB module error would kinda suggest that, tho I doubt Opera takes enough vid.mem. to halt the system.. (;
 
Old 10-20-2005, 08:21 AM   #5
RedShirt
Senior Member
 
Registered: Oct 2005
Location: Denver
Distribution: Sabayon 3.5Loop2
Posts: 1,150

Rep: Reputation: 45
While I cannot say for 100% certain, it is not the Video Card.

As I said though, any ram error is a bad ram error. memtest tests your ram by putting in all patterns of bits into the memory, when it errors it is because your ram didn't accept a pattern, or it accepted and couldn't be read, etc. So anytime you get an error it is not a good one. and 9 on first run is not good. I am a computer tech, and even ram that doesn't error on the first run can be bad, which is usually why I test overnight, it can take 5+ loops sometimes before errors pop.

Aside from the ram(or memory controller), which is an issue, it is possible you have an HD controller issue as well. I know you don't have the room to test, but your ram could be fine, and your board is dying, which also leads to not an HD problem, but an HD controller problem, and a board replacement. Now, that said, if your board does have dying components, it could start damaging other parts of the system because it may stop regulating voltage properly, it may not read/write data properly after awhile, which could corrupt your data, making recovery tough.

In my experience it is best to start at one point and work your way down. Start with the ram, as that is easiest to test and replace.(and the most likely halt culprit). Then the HD, which I would use Ontrack Data Advisor to test. Maxtor, WD, and Seagate all also have tools for this kind of testing, but I use OnTrack because it tests for a few more things and does a little deeper testing.

From there, it will be tough to test your motherboard without a replacement board, but there are some visual things you can look for. Move all the wires and parts out of the way(it works best to remove the board entirely) and in a well lighted area, check for a blown capacitors and broken/shorted traces. The capacitors and the cylindars sticking up off the board, if you see any cracks in their seals, that isn't good, if you see a brown/yellow crust on any of them, that isn't good. If there is an obvious puncture in them, that is bad too. Most motherboards will function for months after having busted caps, but a trace is very unpredictable. It can cause all sorts of weird stuff, so check out the traces(the little lines going to/from all the parts) and if any lines are broken, have a bubble, or have a kink, you have an obvious issue. The board can be bad or going bad without any visual clues, but if you have a visual clue, it is for sure not working right any more, and you need to get a replacement board.
 
Old 10-27-2005, 04:00 PM   #6
ronald-be
Member
 
Registered: Aug 2004
Location: Belgium
Distribution: debian 5.02
Posts: 73

Rep: Reputation: 15
Hello Artanicus,

I don't know the Epox board but mine (ASUS A8N-SLI) has four colored RAM-slots : blue-black-blue-black. If you use two modules, then some slot-combinations are forbidden and only two enable the DDR : those with the two modules in slots with the same color. Maybe check this out in the manual and then on the board. Good luck!

Greetings,

Ronald
 
Old 10-27-2005, 08:14 PM   #7
RedShirt
Senior Member
 
Registered: Oct 2005
Location: Denver
Distribution: Sabayon 3.5Loop2
Posts: 1,150

Rep: Reputation: 45
Actually, ronald, you are not quite right there. The matching slots are for dual channeling, not for DDR enabling. DDR is always enabled.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Memtest-86 v3.1 LadyVadalon Linux - General 1 08-16-2005 08:59 AM
memtest issues, again cellist Linux - Hardware 9 04-11-2005 09:58 PM
Computer crashing seemingly randomly raela Linux - General 10 05-08-2004 06:40 PM
memtest in grub Incanus Linux - General 1 02-07-2004 08:50 PM
Computer going out randomly. andrewlubinus89 Linux - Hardware 5 12-15-2003 10:08 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 12:17 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration