LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 03-07-2005, 07:10 AM   #1
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 641

Rep: Reputation: 36
How to test for hardware disk error?


All,

I recently tried installing Slackware 10.1 and SuSE 9.2 on the following hardware:

Dell Optiplex GX1 w/512MB RAM and 7GB IDE harddisk. Recently updated to latest available BIOS from Dell's website.

Slackware install failed on random locations when copying packages from CD to the local system (tried 3 different CD and DVD players) and SuSE 9.2 gets the same CRC-errors when I'm updating the latest packages. They all download well, but when they are to be installed, the procedure crashes at random places due to CRC errors.

This leads me to suspect the hardware is faulty. The previous owner claims to have run Windows XP Home on the box without any problems, and basic disk surface tests come up without any error-messages.

Thus I suspect that either the disk controller is faulty, the harddisk itself is faulty (at some weird other level than the surface scan reveals) or that there's a BIOS problem of some kind.

I would greatly appreciate any insight on these random, highly annoying errors.

Thank you for your attention.
 
Old 03-07-2005, 07:25 AM   #2
DirtDart
Member
 
Registered: Nov 2003
Distribution: Mandrake 10.1/Solaris 10 (sparc)
Posts: 96

Rep: Reputation: 16
To check your hard drive, find out what brand it is (OEM or not, it's made by one of the major manufacturers), and get the diag disc from the manufacturer's web site. They have some pretty detailed tests, and that will allow you to run diags on your hard disk.

For hardware, I don't know of any linux-based hardware testing suites. Under Windows, I usually use Sandrasoft (http://www.jaggedonline.co.uk/?a=g2005).
 
Old 03-07-2005, 07:35 AM   #3
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 641

Original Poster
Rep: Reputation: 36
Thanks for the tips - I was hoping not having to throw out Linux again just to install the DIAG-tool, as I now after 18-19 retries have finally installed more than 75% of the package updates on SuSE. Still haven't gotten Slackware to install though..

Based on your experience, if you were to make a qualified guess - what's most likely to be causing this problem?
 
Old 03-07-2005, 08:41 AM   #4
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 116Reputation: 116
You have not verified the performance of your CDROM and you have not verified that the CDs are good. Start there.

You then need to verify that the jumper settings on the CDROM and the HD are correct. After you do these things and eliminate them as problems, then you can start considering the HD as a problem. But you also have to consider the IDE controller.
 
Old 03-07-2005, 08:53 AM   #5
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 641

Original Poster
Rep: Reputation: 36
Thanks for pointing that out - my apologies for not specifying my tests in more detail initially:

The external CD-ROM with which I've tested it has run flawlessly on 5-6 other computers both for reading and burning. As far as I can see, the jumpers for the IDE drive and CDROM are correct in the main box - the internal CDROM appears to work well as well, but has not been tested on other systems, as is the case with the external drive. Thus I have eliminated the CDROM as source of the problems, also because I get the same errors when installing packages from the internet. The CDs install well on 5-6 other computers, and have been verified with MD5 as well just to be sure.

It is always the CRC checks that give the errors by the way
 
Old 03-07-2005, 08:56 PM   #6
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 116Reputation: 116
Quote:
As far as I can see, the jumpers for the IDE drive and CDROM are correct in the main box
This is not good enough; they are right or they are not. You should be able to tell absolutely; "as far as I can see" indicates uncertainty to me.

Can you do ordinary disk I/O with the drive, with no CRC errors? Do the errors only occur when you are doing I/O with large files, or with all files? Have you tried the drive in another box to eliminate controller problems?

Have you changed the cable to the drive to eliminate that as a problem? You are not using a CS cable are you? If you are, are the drive jumpers set correctly for that?

You have not specified the brand of the drive, or its age but if it is 7 gigs it is probably 8-9 years old. What brand?

A BIOS problem is unlikely.
 
Old 03-09-2005, 01:41 AM   #7
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 641

Original Poster
Rep: Reputation: 36
First of all a big thanks to jiml8 for teaching me how to trouble-shoot this - I greatly appreciate it.

Quote:
Originally posted by jiml8
This is not good enough; they are right or they are not. You should be able to tell absolutely; "as far as I can see" indicates uncertainty to me.
Unfortunately I'm no expert (which is why I have to ask the question in the first place), so to the very best of my knowledge, the jumpers are correct. It's a fairly simple setup with one IDE drive and one CDROM. This also bears all signs of being the default setup with which it came from the factory. However, as I said, I'm no expert..

The harddrive is a Seagate ST38641A with 8.61GB unformatted capacity. It's set to be master/single (default). The rest of the computer appears to be completely original, except it's got an additional 256MB RAM inserted, making it a total of 512MB.

As for IO-errors, the /var/log/messages does not indicate any IO-errors, but like I said, when I tried to install Slackware on the machine, and large amounts of data were copied, after 15-20 minutes, they started getting CRC-errors. Same thing with SuSE's online update - when 50MB was downloaded, it started failing the patch reassembly. I am now running SuSE 9.2 perfectly on the box. After 9-10 attempts I got all the patches downloaded and I have seen no errors. However, I'd like the box to run Slackware, which simply refuses to install due to an overwhelming number of CRC errors on installation, and although my online update is done for now, it doesn't mean the problem has been resolved.

I have not tried the harddrive on other boxes or other cables, simply because my other computers are laptops where such drives simply don't fit. I also don't have access to any other computers where this is possible. Unfortunately.

So quick questions:
1 - where in Linux can I check when I get I/O errors and of what kind?
2 - If it's a controller problem, how do I provoke it to get it confirmed?
3 - Fixing it....
 
Old 03-09-2005, 02:12 AM   #8
overlord73
Member
 
Registered: Apr 2004
Location: ..where no life dwells..
Posts: 541

Rep: Reputation: 30
addendum:

...for hardware-check use manufacturer´s tools or under linux:
badblocks (ie badblocks -n /dev/hda), for seeking for bad blocks on drives
fsck is for checking and repairing filesystems
 
Old 03-09-2005, 04:04 AM   #9
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 641

Original Poster
Rep: Reputation: 36
Thanks overlord73

There appears to be no bad blocks on the disk itself. I also formatted the disk when I installed it (both Slackware and SuSE) with extensive testing of the disk, and the disk came out clean...

Does this leave me with a buggy controller? It's only when I do large data-transfers on big files - could it be an I/O related problem elsewhere?

Oh by the way - the disk is in UltraDMA/33 mode (according to SuSE's IDE DMA setup) - does that mean anything?

I'm completely stuck on this one. Thanks for all your patient help!

Last edited by Yalla-One; 03-09-2005 at 04:09 AM.
 
Old 03-09-2005, 04:12 AM   #10
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 116Reputation: 116
You did not answer any of the questions I asked you, except for drive brand and to say that you believe the jumpers are in the same position they were when the box was new. But the information you did add makes me think the drive is showing a thermal problem with the head(s) when it does a lot of disk I/O. That the drive is a seacrate lends support to that idea; it must be about 8 years old and that is simply ancient for a seacrate. Of course, that is simply ancient for an IDE drive anyway.
 
Old 03-09-2005, 05:11 AM   #11
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 641

Original Poster
Rep: Reputation: 36
jiml8,

I answered your questions to the best of my knowledge. If I was a full-flegded expert I probably wouldn't have had to ask in the first place

The jumpers are set so that the harddrive is master/single drive, and as it's the only harddrive in there, and this according to the docs is the default, I assume this is correct. However, if there is some special circumstance that call for different jumper settings, I do not know. Hence the term "It looks correct according to the docs and default settings" as the entire computer is "default" without changes.

As for the CRC errors, I don't know how to check for that. I've done a surface scan which came up OK, and I've done a full block check when formatting the drive with slackware that came up empty. I have also fine-read /var/log/messages to search for CRC or /dev/hda messages and come up empty.

Forgive me if I sound ignorant for answering this way, but without expert knowledge, this is as far as my intuition has driven me. If I should run other applications or look elsewhere for further information I would greatly appreciate any pointers to such apps/info.

If it is indeed the drive heads, is there any way to find out for sure? I seem to remember an old "dd" command which writes or reads huge data. Ie - I would like to provoke it to make certain where the problem is...

Thanks again for your insight and patience
 
Old 03-09-2005, 12:22 PM   #12
J.W.
LQ Veteran
 
Registered: Mar 2003
Location: Boise, ID
Distribution: Mint
Posts: 6,642

Rep: Reputation: 87
A different thought, and at the risk of stating something you already know, maybe the problem with the installation is due to corrupted ISO images. After you downloaded the Slack and/or Suse ISO images, did you verify their integrity by running an MD5SUM on them? If not, definitely check it. The command is basically
Code:
md5sum <filename>
and the output will be a long string of letters and numbers, which must match the reference MD5SUM value that was on the original download site. If they don't match then your image is no good, and you'll need to d/l it again.

That being said, based on your description this does sound like a hard drive issue, and given that you've already run the basic tests, the only other suggestions I'd have would be to check the manufacturer's website for any diagnostic tools they might make available, and failing that you might try swapping out that existing drive and trying the installation with a new drive. That may not be worth the trouble though, but my point is that you'd want to try to systematically eliminate each component as the cause of the problem. If I were in your shoes I'd probably assume the hard drive itself was the culprit, and if it has been in use for 8 years, then the reality may just be that it's had a good run but has hit the end of the line.

Good luck with it either way -- J.W.
 
Old 03-11-2005, 02:58 AM   #13
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 641

Original Poster
Rep: Reputation: 36
Thanks J.W.

Seems like I've pretty much exhausted every option here... For the record, yes I've done md5check on the ISO's and have also checked them on a couple of other computers (laptops) without problems. Furthermore, as the problems arise in random locations, I doubted CD error from day 1.

As there seems to be a concensus here that this is not due to IDE controller or IO on the motherboard, I am concluding that it is a faulty harddrive.

Thanks again to all who contributed their time and efforts in this thread.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Hardware Test Suite DaKKoN Linux - Software 1 08-12-2004 08:03 AM
hard disk stress test alaios Linux - Hardware 2 05-23-2004 08:45 PM
Hardware test utility saavik Linux - Hardware 1 04-14-2004 07:02 AM
looking for a program to test hardware mrtwice Linux - Hardware 6 12-04-2003 09:14 AM
linux hardware compatibility test cilcit Linux - Hardware 2 09-11-2003 11:45 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 11:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration