LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Recurrent I/O failure: motherboard related? (https://www.linuxquestions.org/questions/linux-hardware-18/recurrent-i-o-failure-motherboard-related-313182/)

ryanreich 04-14-2005 10:11 AM

Recurrent I/O failure: motherboard related?
 
For about eight months I've been having irregular but recurrent freezes which seem to be hard-drive related. That is, I will be away from my computer, return, and find it locked up with the HDD LED glaring at me. When I reboot, the drives often don't show up in the BIOS and it is only after much resetting, unplugging, and manual detecting that I can get them to come back. One time, I caught it happening and was able to save my work before it froze, which (finally) pushed a message into the logs:

Apr 13 10:56:51 [kernel] hda: DMA disabled
Apr 13 10:56:51 [kernel] hdb: DMA disabled
Apr 13 10:56:51 [kernel] hdb: drive not ready for command
Apr 13 10:56:52 [kernel] ide0: reset: success
Apr 13 10:56:56 [kernel] hdb: status error: status=0x80 { Busy }
Apr 13 10:56:56 [kernel] ide: failed opcode was: unknown
Apr 13 10:56:56 [kernel] hdb: drive not ready for command
Apr 13 10:57:05 [kernel] ide0: reset: success

That doesn't look like the message for a hard disk failure, and I have not yet actually suffered anything like data loss. Nor even filesystem corruption of the slightest sort except what might result from rebooting without unmounting. Furthermore, both drives are new, and replace two old drives which appeared to be failing with similar symptoms, one of which was also relatively new (the other was not exactly old either, only about a year and a half).

Therefore I'm worried that the motherboard is going. This is not the first thing that's happened to it: the built-in sound card also developed a really annoying click and I had to buy a separate one. How do I verify this? That above is the only kernel message which has ever resulted from this problem, and it's not very specific. For reference, the motherboard is a Shuttle AN35/N, with an NForce2 chipset, and no, this is not the Nforce lockup problem, which was fixed in their BIOS two Decembers ago.

Thanks,
Ryan

penguinlnx 04-16-2005 10:30 AM

(1) Shut off the computer.

Phase I: Inspect and general clean

(2) Open up the computer and do a physical inspection. Look for buildup of dust around fans and contact areas, vents and cards. Look for cables that have been knocked or pulled and which are not properly plugged in. Check for moisture damage, electrocuted bugs or spiders, or (yes it happens) mouse droppings.

(3) Thoroughly clean out any dust and debris. To do this, you should have a can of compressed air you can pick up at any Radio Shack store, or hi-fi supply house.

(4) Carefully remove the ribbon cables one at a time, noting their orientation and re-install them. There is a red strip down one side of the ribbon to indicate which way they go.
Usually the redline goes to the left, if you are facing the drive.

Phase 2: Get out your contact cleaner - pure isopropyl alcohol not diluted by water.

Any mechanical /removable electrical contacts should be inspected and cleaned. People often put fingerprints on gold contacts when installing cards and RAM. The result is later failure as salt, moisture and grease from fingers causes contacts to fail.

(5) Carefully remove the memory cards and reinstall them.
periodically, what happens is that there is a usually a bad connection on one of the un-soldered contacts in the computer which causes unusual behaviour or failure. unpredictible behaviour is almost always hardware related.

(6) Carefully remove PCI slot cards and reseat them, making sure they are securely held again by their holdin screws. Sometimes a card may not be exactly aligned, and this results in an unreliable connection.

Phase III: Test and Fan/powersupply checks

(7) Make sure the load on the power supply is as evenly distributed as possible. Put Hard drives on separate power leads, and put CDroms and Floppies on another lead. Use all the leads if possible, rather than putting your stuff all on one arm of the power supply.

(8) Reboot, noting whether all the fans in the unit are operating properly, especially any connected to the PROCESSOR or Video Card. If some fans aren't working, replace or seek some help at the store or repair shop.

Post the results if you are still getting errors.

ryanreich 04-22-2005 11:31 AM

Thanks, and sorry it took so long to reply. It seems that the problem is due to the power supply: when I plugged the hard drives into the other lead they don't have this problem anymore. I've used the bad lead for the CD drives, since they don't see quite as much action and are less likely to fail (I mean, this is a pretty intermittent problem). At some point I should replace that thing, but it's still working as far as I can tell.

Ryan

penguinlnx 04-22-2005 02:43 PM

Often with the cheap powersupplies, there is a flaw or a low power capability on one or two of the powersupply cables.
You do the right thing splitting up the power drain across different parts of the supply.
But I would look into upgrading quickly, before there is a power failure. A weak power supply is very unhealthy.
When you buy a new one, look for a 'quiet fan' , 'dual speed auto', or 'stealth' supply, and get one that is at least 100 watts more than you think you need. Never run a power supply at its max rating, especially if you leave it on 24/7. The extra few dollars you spend on a quieter supply by the way is worth its weight in gold for you as the person who has to listen to the noise all day.

Glad I could help.


All times are GMT -5. The time now is 08:45 PM.