LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Is My Hard Drive Failing or Not? (https://www.linuxquestions.org/questions/linux-hardware-18/is-my-hard-drive-failing-or-not-4175434972/)

tigerflag 10-31-2012 05:10 PM

Is My Hard Drive Failing or Not?
 
I have a 500 GB Seagate hard drive that's about 3-1/2 years old. It only has 16.88 GB of data on it. PCLOS was running very flacky so I popped in a live disk of Vector. When it booted, S.M.A.R.T said that the hard drive was failing so I backed up everything. Since then I haven't written anything to the hard drive. I also blew out a lot of dust from inside the case.

Today I booted a live disk of Salix OS. Ran fsck, badblocks and S.M.A.R.T. and got conflicting results. fsck and badblocks say the hard drive's OK, but S.M.A.R.T. says it's going to fail within hours.

umount /dev/sda gives the message that it's unmounted already. But then doing fsck on /dev/sda gives the message that the drive is mounted or busy.

/sda1 is the large partition on /dev/sda where I store my data.

Can anyone interpret these outputs?

Let me know what other information you need. As usual, I'm just lost ...

++++++++++++++

Code:

root[one]# umount /dev/sda
umount: /dev/sda: not mounted
root[one]# fsck -t ext3 /dev/sda
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
/sbin/e2fsck: Device or resource busy while trying to open /dev/sda
Filesystem mounted or opened exclusively by another program?


Code:

# umount /dev/sda1
root[one]# fsck -t ext3 /dev/sda1
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
/data: clean, 21830/51200000 files, 4425661/102398302 blocks


Code:

# badblocks -v /dev/sda1
Checking blocks 0 to 409593207
Checking for bad blocks (read-only test): done                               
Pass completed, 0 bad blocks found.


Code:

# smartctl -a /dev/sda1
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda 7200.12 family
Device Model:    ST3500410AS
Serial Number:    5VM050NN
Firmware Version: CC31
User Capacity:    500,107,862,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Wed Oct 31 20:18:01 2012 Local time zone must be set--see zic m
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:          ( 600) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      (  95) minutes.
Conveyance self-test routine
recommended polling time:      (  2) minutes.
SCT capabilities:            (0x103f)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  117  099  006    Pre-fail  Always      -      141781668
  3 Spin_Up_Time            0x0003  099  097  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      240
  5 Reallocated_Sector_Ct  0x0033  001  001  036    Pre-fail  Always  FAILING_NOW 4095
  7 Seek_Error_Rate        0x000f  078  060  030    Pre-fail  Always      -      68019206
  9 Power_On_Hours          0x0032  069  069  000    Old_age  Always      -      27215
 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      277
183 Runtime_Bad_Block      0x0000  099  099  000    Old_age  Offline      -      1
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0
188 Command_Timeout        0x0032  100  096  000    Old_age  Always      -      65586
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  061  056  045    Old_age  Always      -      39 (Lifetime Min/Max 21/42)
194 Temperature_Celsius    0x0022  039  044  000    Old_age  Always      -      39 (0 16 0 0)
195 Hardware_ECC_Recovered  0x001a  040  016  000    Old_age  Always      -      141781668
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  199  000    Old_age  Always      -      44
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      53364968680626
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      850675434
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      2726443239

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

+++++++++++++

Thanks bunches!

TobiSGD 10-31-2012 07:10 PM

Please use code-tags for output of programs in the future, it makes your posts much more readable and preserves the formatting.

To your problem: That you can't mount/unmount /dev/sda is pretty normal, since you state that it is partitioned. You can only mount file-systems, not whole partitioned disks. The smartctl output indicates that the disk is dying and I personally would trust that program. To be sure you can test the disk with the manufacturer's diagnosis tool.

tigerflag 10-31-2012 07:25 PM

Where are the instructions for code tags? I honestly looked but couldn't find them.

Thanks for the advice.

TobiSGD 10-31-2012 07:43 PM

If you use the advanced editor you can just mark the part of your post that you want to put into code-tags and press the #-button above the input field.
You cans also do it manually with writing the tags , for example [code]lspci[/code] will look like this:
Code:

lspci

tigerflag 10-31-2012 08:05 PM

Thanks, TobiSGD. Fixed it.

H_TeXMeX_H 11-01-2012 03:33 AM

Yes, the drive is dying, so replace it.

You can also run SMART long tests to check for bad blocks, but if an attribute is already failing, there is no use.

tigerflag 11-01-2012 10:15 AM

Quote:

Originally Posted by H_TeXMeX_H (Post 4819495)
Yes, the drive is dying, so replace it.

You can also run SMART long tests to check for bad blocks, but if an attribute is already failing, there is no use.

How is that done?

I just bought a Samsung 128 GB SSD and another Seagate 500 GB HDD. I'd like to test them before putting data on them. What are the best ways to do this?

H_TeXMeX_H 11-01-2012 02:02 PM

I always run a long test after I buy it and at periodic intervals (say 1000 power on hours).

Code:

smartctl -t long /dev/sda
Then you wait for it to finish and check the '-a' output for the results.

Fred Caro 11-01-2012 02:48 PM

To TobiSGD,
why would not trust the output of smartctl? Call me cynical but I would rather doubt the results of the manufacturers test disk as it would surely give favorable results, no?
I've found the smart test on partedmagic to have accuratly predicted a drive's demise several times.It certainly gives more details than some test disks from some manufacturers.

Fred.

TobiSGD 11-01-2012 03:08 PM

Quote:

Originally Posted by Fred Caro (Post 4819933)
To TobiSGD,
why would not trust the output of smartctl?

Isn't that what I stated?

Quote:

Call me cynical but I would rather doubt the results of the manufacturers test disk as it would surely give favorable results, no?
Actually no. If a major customer of that manufacturer, let's say someone running large server-farms, tests disks with that tool and they find out that the tool is giving inaccurate results you can be sure that there will be a) a major law-suit, and b) a change to a different manufacturer. While sometimes a little bit of mistrust is good, there are situations where it is simply misplaced.

jefro 11-01-2012 03:36 PM

I tend to trust the OEM's extended drive tests. They have more insight to their own equipment. I doubt they'd risk an intentional error that made their product look good. Smart is most of but not the entire diags being run. The long test is the best test to use for hard drives.


All times are GMT -5. The time now is 11:17 PM.