LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Old system hangs randomly shortly after startup: hardware fault suspected (https://www.linuxquestions.org/questions/linux-hardware-18/old-system-hangs-randomly-shortly-after-startup-hardware-fault-suspected-630034/)

ErV 03-23-2008 11:29 AM

Old system hangs randomly shortly after startup: hardware fault suspected
 
Hello!

I have assembled a PC (for one of my relatives) from used parts some time ago, and after several months it suddenly stopped working - system now silently hangs (completely) shortly after startup. I suspect that it is a hardware fault (since it occurs at random points and not only on one distro), so some of the components needs to be either fixed or replaced, but I'm not sure, which components misbehave (although I suspect motherboard fault for no reason).

Hardware configuration:
AMD Athlon 900 mhz CPU ("Thunderbird")
DFI AD75 Motherboard (agp slot, DDR1 RAM).
512mb of RAM
GeForce 5200 FX Video.
HDD - Seagate Barracuda ST340016A, 40Gb.
Optiarc DVD-RW AD 5170A
EVB-2506AC 250W power supply.

lspci output:
Code:

00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 40)
00:14.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1)

Distribution:
Slackware 12 with default 2.6.21.5-smp kernel.

Detailed description of problem:
System sometimes boot and loads normally, then completely hangs (i.e. Alt+Ctrl+SysRq+B doesn't work, mouse cursor doesn't move, and keyboard doesn't switch LED's after pressing numLock, capslock, etc) at random point: sometimes that happens on kde login screen, sometimes it happens 3..10 minutes after logging into kde. When system hangs, it doesn't respond to Alt+Ctrl+SysRq+B, Alt+Ctrl+F1 or Alt+Ctrl+Del, mouse cursor stops, computer doesn't seem to be doing anything. After that system can be rebooted by pressing "reset", but hangs at seemingly random point during kernel loading: either after "bios data checked" message, or after launching udev, or somewhere else before X has chance to start. In rare cases even pressing "reset" doesn't help - system hangs either after initial "NVidia BIOS version" message, or even this screen does not appear ( see black screen). Turning power on and off seems to always work (i.e. I see initial "NVidia bios version xyz" message and BIOS memory test, etc.), but machine hangs later at the random point before X starts. If machine is turned off for hour or more, it seems to work longer than If I reboot it immediately after hang.

Additional info:
1) Problem appeared about a week ago for no obvious reason. As that machine's user said "Everything suddenly stopped working".
2) There was a fridge magnet attached to a PC's case in the area of DVD-Drive. This thing was there at least for a week. I've noticed it after machine stopped working, and removed it immediately.
3) Right now I don't have working spare parts that fit all current hardware (DDR1 ram, or Socket A cpu), so I can't find what's wrong by using another part.

What I've tried:
1) I've tried to boot Ubuntu 7.04 livecd, but attempt was not successfull: system hangs either during ubuntu load screen (with progress bar), or shortly after loading desktop.
2) I've examined motherboard for visible defects - I've tried to find deformed oxyd condensers (capacitators?), but they all seems to be in order.
3) Checked PC Health status in BIOS - it looks like there is no CPU overheating (41 degrees in Celsius).

The Question/Problem:
Which component is broken and needs to be replaced or fixed(motherboard, RAM, CPU, or something else, maybe?)? Some tips/links/suggestions that will help me to identify broken part are also appreciated.

H_TeXMeX_H 03-23-2008 02:25 PM

Try running memtest86, if you can, I think the Ubuntu CD has it as an option. It may be bad RAM. If you can't run it, try taking out one RAM stick at random, then if it still messes up, put it back and take another out. Don't do this with the power on, of course.

You could also check the HDD for errors. Maybe this CD will help, it has all these tools and more:
http://www.ultimatebootcd.com/

Quote:

2) There was a fridge magnet attached to a PC's case in the area of DVD-Drive. This thing was there at least for a week. I've noticed it after machine stopped working, and removed it immediately.
LOL, if it was a regular, thin, flexible fridge magnet then it's no problem, their magnetic field is very short range, you typically need a bigger magnet to affect the HDD from outside the case.

ErV 03-23-2008 07:58 PM

Quote:

Originally Posted by H_TeXMeX_H (Post 3097949)
Try running memtest86, if you can, I think the Ubuntu CD has it as an option. It may be bad RAM. If you can't run it, try taking out one RAM stick at random, then if it still messes up, put it back and take another out. Don't do this with the power on, of course.

You could also check the HDD for errors. Maybe this CD will help, it has all these tools and more:
http://www.ultimatebootcd.com/

Thanks for the link and tips about RAM. I'll try testing RAM, and will report results then.


Quote:

Originally Posted by H_TeXMeX_H (Post 3097949)
LOL, if it was a regular, thin, flexible fridge magnet then it's no problem, their magnetic field is very short range, you typically need a bigger magnet to affect the HDD from outside the case.

No, I called it "fridge magnet" because it was closest similar thing I could remember english name for. It's an inflexible thick object with the magnet glued to object's backside. Magnet itself is disk-shaped, as large as a coin - with diameter of 1.75 centimeters (0.68 inches) and thickness of 2.5 millimeters (0.09..0.1 inches). This could be easily used to weakly magnetize another small object, but I don't think magnet can break anything except HDD (and this one isn't powerful enough and was in wrong location for that) that could prevent PC from booting. I've mentioned this "magnet thing" just in case. Besides, I suppose in case of sudden HDD failure (when X is running) machine should respond to some basic operations (like Ctrl+Alt+F1 or mouse movement - I've worked on a PC with faulty HDD cable before) or it should hang in a relatively consistent way (i.e. at certain point of loading process, while launching a certain program, etc.). So, magnet is probably irrelevant to the problem.

onebuck 03-23-2008 09:00 PM

Hi,

Still not enough flux to worry about for that size of a magnet on a metal chassis.

ErV 03-24-2008 12:35 PM

Well, my progress so far:
1) I've tried running memtest from ubuntu CD during this morning. everything looked fine, but I haven't enough time (40minutes) to run full test, and turned machine off.
2) After that test machine simply doesn't boot in 90% of cases. "Doesn't boot" means: it doesn't beep on startup, monitor's led is flashing, indicating that there is no video signal, sometimes I hear frequent clicks(rattle?) in system speaker, which can also be heard in headphones. All fans work fine, and I can turn computer off by holding power button. Pressing reset doesn't help.
3) Tried removing memory modules (leaving one them in place, of course), disconnecting DVD-ROM, HDD, videocard and videocard (I hoped it will give me two long warning beeps) - no difference. Black screen (if videocard is attached), noise in system speaker and headphones.
4) Downloaded ubcd410, tried to boot computer with it - no luck, because system doesn't work. Actually machine was able to launch once (with detached HDD and only one memory module), but I've got a "bright" idea to insert other memory, turned computer off, inserted other modules, turned machine on and was unable to make it work again (No matter which memory modules were used).

I'll try to remove motherboard from case tomorrow - to check it more closely for defects.

onebuck 03-24-2008 12:54 PM

Hi,

Do you have a means to test the PSU. Do you have a voltmeter?

I suspect a power supply problem.

H_TeXMeX_H 03-24-2008 02:43 PM

It looks like it's definitely a hardware issue. Typically it would emit beeps if there were something wrong with the motherboard or CPU or BIOS/CMOS, so these are likely ok. Yeah, you should check the PSU if you can, it may be failing.

ErV 03-26-2008 04:18 AM

I've extracted motherboard from case (I thought something could get between case and motherboard and cause shortcircuit), no visible problems, still doesn't work outside of case.

Quote:

Originally Posted by onebuck (Post 3098933)
Hi,

Do you have a means to test the PSU. Do you have a voltmeter?

I suspect a power supply problem.

No, I don't voltmeter right now, but I'll try to use PSU from another machine. This will have to wait until evening, though, since I need that "other machine" running right now.

ErV 03-27-2008 12:42 PM

Thanks for the suggestion about PSU. I've tested computer with PSU from another machine, and everything seems to work fine, which means that current PSU is broken (no wonder - it was initially taken from another broken computer), so I'll have either to buy replacement or fix this one.

I suppose problem is solved. Thanks for the help!

H_TeXMeX_H 03-27-2008 01:40 PM

Good to see you found the bad piece of hardware.

onebuck 03-28-2008 07:29 AM

Hi,
Quote:

Originally Posted by ErV (Post 3102172)
Thanks for the suggestion about PSU. I've tested computer with PSU from another machine, and everything seems to work fine, which means that current PSU is broken (no wonder - it was initially taken from another broken computer), so I'll have either to buy replacement or fix this one.

I suppose problem is solved. Thanks for the help!

Another possiblilty is the power match for the the PSU. Switching supples will not handle large load changes if the rail limits are reached therefore matching of the load too the PSU is important.


All times are GMT -5. The time now is 01:53 AM.