LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (http://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Troubleshooting with hardware tests (http://www.linuxquestions.org/questions/linux-hardware-18/troubleshooting-with-hardware-tests-880338/)

Toonses82 05-12-2011 11:20 AM

Troubleshooting with hardware tests
 
I've got a computer at home that runs Windows7 and I have reason to believe I'm having a hardware problem. The reason I'm posting here at LinuxQuestions is because I used the firmware tests that are included on the openSUSE 11.3 installation DVD and I don't know what to make of the output. Perhaps someone can help.

I've been using my home-built Windows7 64-bit PC for a few months with minimal problems. Suddenly, a few days ago I started getting BSOD pretty regularly. It got to the point where I couldn't even boot to the desktop. Leaving the computer off for a while and coming back to it a few hours later would temporarily resolve the issue. I could boot up normally, but after 15-20 minutes of use, BSOD again. Reinstall of Win7 gets me the same results.

I ran the tests using the openSUSE DVD and here's the output:
Code:

Linux-ready Firmware Developer Kit - Release 3

[FAIL] DMI information check
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
  F        No SMBIOS nor DMI entry point found.
[FAIL] MTRR validation
  F        Memory range 0xd0000000 to 0xdfffffff (PCI Bus 0000:00) has incorrect attribute write-back
  F        Memory range 0xf0000000 to 0xffffffff (PCI Bus 0000:00) has incorrect attribute write-back
[FAIL] CPU frequency scaling tests (1-2 mins)
        4 CPU frequency steps supported
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
  F        Supposedly higher frequency is slower on CPU 0!
        P-state coordination done by Harware
  F        Firmware not implementing hardware coordination cleanly. Firmware using SW_ALL instead?
  F        Firmware not implementing hardware coordination cleanly. Firmware using SW_ANY instead?
[FAIL] HPET configuration test
  F        Failed to locate HPET base
[FAIL] OS/2 memory hole test
  F        The memory map has a memory hole between 15Mb and 16Mb

There were a couple of additional warnings, but I figure the fails should be focused on. The hardware is only a couple of years old. Can anyone help me with this output?

TobiSGD 05-13-2011 07:31 AM

When switching the power of for a few hours temporary solves the problem I would assume a problem with overheating. Open the machine and check if the fans are running properly. Clean out all dust. Check the temperatures of the CPU, in Windows you can use Coretemp for that.

catkin 05-13-2011 07:45 AM

Quote:

Originally Posted by TobiSGD (Post 4355252)
When switching the power of for a few hours temporary solves the problem I would assume a problem with overheating. Open the machine and check if the fans are running properly. Clean out all dust. Check the temperatures of the CPU, in Windows you can use Coretemp for that.

Could it not be that some semi-conductor device(s) is/are failing? Isn't it normal for failing semi-conductor devices to work when cold and fail when hot? All the same, ensuring good cooling can do no harm. Ideally the problem could be isolated by removing all non-essential devices and progressively swapping what remains for "known good" equivalents -- but that does require a set of "known good" equivalents.

TobiSGD 05-13-2011 08:24 AM

Quote:

Originally Posted by catkin (Post 4355268)
Could it not be that some semi-conductor device(s) is/are failing? Isn't it normal for failing semi-conductor devices to work when cold and fail when hot?

You are right, I didn't thought about that! Can also be a faulty condensator.

@Toonses82: When you have the machine open search the motherboard for condensators that have broken or lifted heads, or have spilled something. Sadly, there is no way to see if a semi-conductor fails under heat.

Toonses82 05-13-2011 05:26 PM

Immediately after posting, I set about installing openSUSE 11.4 so I would at least have a functioning machine. After running this new install for about a day, I haven't encountered a single problem. I know the two OSes can react differently to hardware issues, so perhaps this doesn't mean anything. Still, I can't help but wonder if maybe I had some major virus/malware issue.

What do you think? If openSUSE 11.4 is running fine, does that definitively rule out hardware problems?

Soadyheid 05-13-2011 06:44 PM

Quote:

Immediately after posting, I set about installing openSUSE 11.4 so I would at least have a functioning machine. After running this new install for about a day, I haven't encountered a single problem.
The longer it runs clean, the more probable your solution. :) In cases where you need to separate hardware from software problems, you need to replace one of them to eliminate one or the other. In this case it looks like Windows 7, though it's a bit suspicious that it ran OK initially. I assume the media you re-installed it from is OK?

Overtemping can cause hangs, check fans and cooling, as catkin mentions above, along with stripping down to what's known as a "minimal system" is a good idea if the problem persists. You'd then need to add back additional memory and adapter cards one at a time till the problem reoccurs. The last thing fitted is then suspect. Note you may have to run for quite a while before each additional part is re-installed to be confident. (Saves buying replacement parts, just takes longer)

I agree with TobiSGD regarding Condensers/Capacitors. You get the end caps on electrolytic ones deforming and/or leaking, usually to do with the DC power feed on the motherboard or the processor's voltage control circuitry. This causes spurious power problems which can cause hangs and other random problems.

Good luck and Play Bonny! :hattip:


All times are GMT -5. The time now is 10:44 AM.