LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Hard drive Failing? (https://www.linuxquestions.org/questions/linux-hardware-18/hard-drive-failing-738431/)

jsteel 07-07-2009 05:03 PM

Hard drive Failing?
 
I installed some hard drives to a new computer (haven't use these drives in a while). I created a new ext3 file system on them and tested them with smartctl. When I run a test with smartctl it fails with "Completed: read failure" with 90% remaining.

I found some information regarding bad blocks and a way to tell the computer to not use various portions of the drive, but I don't know where the problem is as the "LBA_of_first_error" shown after the test is run is blank. After a bit of research, every case I have found shows some information here. Is it worth looking into this further or does it mean that the drive is dying and it's better off in the bin?

The overall-health self-assessment test result is passed, so it seems functional for now. It's a Samsung 250GB IDE HDD.

xeleema 07-07-2009 05:30 PM

Greetingz!

Bad blocks should be automatically scanned for and reallocated by the drive's integrated disk electronics (IDE) and the accompanying firmware. The best test to find out if a disk is dying would be to run a "dd" against it, and watch your syslog (/var/log/messages in most Linux distributions).

In one termnial window, run the following command;
time dd if=/dev/hda1 of=/dev/null
(NOTE: Don't get the "if=" and "of=" designations swapped around, or you will wipe out the contents of the drive.)

In another, run something similar to;
tail -f /var/log/messages

If you start to see "Drive Seek" errors, or something that looks like this;
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

Then you've got a dying drive on your hands.
Also, listening for the "Click of Death" is a good indicator.

G'luck in yer struggles, chummer!

H_TeXMeX_H 07-08-2009 04:17 AM

smartctl is used to test entire disks not partitions, I'm hoping you ran it on the whole disk. What command did you use ? Should be something like:

Code:

smartctl -t long /dev/sda

jsteel 07-08-2009 05:08 AM

xeleema,
Thanks I will try that later.

H_TeXMeX_H,
Yes that is the command I ran (not sda1 for example).

H_TeXMeX_H 07-08-2009 05:19 AM

Well, if the test failed, then there is a problem with the disk. Can you post the attributes and results of the test as it appears on 'smartctl -a /dev/sda'.

jsteel 07-08-2009 04:07 PM

xeleema,
Yes it came up with "media errors" after a minute or two of running.

H_TeXMeX_H,
I'm not near the computer right now, I'll paste the output as soon as I can.

onebuck 07-08-2009 07:19 PM

Hi,

I would get the hdd manufacture diagnostics and run those. 'smartctl' is great but I would still get the original diagnostics.

'UBCD (Ultimate Boot CD)' allows users to run floppy-based diagnostic tools from most CDROM drives on Intel-compatible machines, no operating system required. The bootable cd includes many diagnostic utilities.

The above link and others available from 'Slackware-Links'. More than just SlackwareŽ links!

xeleema 07-09-2009 01:40 PM

Greetingz!

'onebuck' makes a good point about the manufacturer's diagnostics. However, if you ran the full "dd" test I mentioned previously, let me save you some trouble;

This: ..."media errors" after a minute or two of running.

Means: Toss the drive. Move along, nothing to see here.

If the disk isn't stone-dead now, it will be in short order. Besides, those "media errors" typically cause I/O-wait hangs. Which usually run 5 to 20 seconds. I don't know about you, but I don't want my server/project_box/workstation/etc screeching to a halt every time the "special" hard disk is accessed.

(NOTE: My advice does not apply if you're trying to scrape usable data off of that drive. _IF_ that's the case, start the file-copy now while you still can!)

jsteel 07-13-2009 09:19 AM

Yes I thought that after I saw the output. There's nothing on it so it's binned now. Thanks for your help. That's the last Samsung disk I use.

xeleema 07-16-2009 01:33 AM

Hold on there, jsteel.

That disk might have been kaput, but there's something you have to remember about hard drive manufacturers;

Most of the aluminum chassis are cut in Korea, the integrated circuits burned in Malaysia, and everything's assembled in Taiwan.

After dealing with literally *thousands* of hard drive failures in a Production data center, I can tell you two things;

1) If ,according to the manufacture date, a drive is six months old or newer - watch it for a year.

2) If, according to the manufacture date, a drive is 18 months old, or older (and has run continuously without issue), then it'll last you five more years.

Corollary; if the drive is five years old, watch it for a year.

Have a good one!

H_TeXMeX_H 08-23-2009 09:27 AM

Quote:

Originally Posted by xeleema (Post 3609114)
Hold on there, jsteel.

That disk might have been kaput, but there's something you have to remember about hard drive manufacturers;

Most of the aluminum chassis are cut in Korea, the integrated circuits burned in Malaysia, and everything's assembled in Taiwan.

After dealing with literally *thousands* of hard drive failures in a Production data center, I can tell you two things;

1) If ,according to the manufacture date, a drive is six months old or newer - watch it for a year.

2) If, according to the manufacture date, a drive is 18 months old, or older (and has run continuously without issue), then it'll last you five more years.

Corollary; if the drive is five years old, watch it for a year.

Have a good one!

I agree, see the bathtub curve:
http://en.wikipedia.org/wiki/Bathtub_curve


All times are GMT -5. The time now is 10:09 AM.