LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Reading smartctl results (https://www.linuxquestions.org/questions/linux-software-2/reading-smartctl-results-4175542385/)

blackRonin 05-12-2015 11:26 AM

Reading smartctl results
 
Hello

I tested my sata drive with smartctl few times (long, short, conveyance):

Code:

smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.19.0-15-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:    Western Digital Scorpio Blue Serial ATA (AF)
Device Model:    WDC WD7500BPVT-22HXZT3
Serial Number:    WD-WXE1A61A1142
LU WWN Device Id: 5 0014ee 656ca6309
Firmware Version: 01.01A01
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sun May 10 20:08:09 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)        Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)        The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (16500) seconds.
Offline data collection
capabilities:                          (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003)        Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01)        Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:          (  2) minutes.
Extended self-test routine
recommended polling time:          ( 162) minutes.
Conveyance self-test routine
recommended polling time:          (  5) minutes.
SCT capabilities:                (0x7035)        SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  177  174  021    Pre-fail  Always      -      2108
  4 Start_Stop_Count        0x0032  001  001  000    Old_age  Always      -      137520
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  081  081  000    Old_age  Always      -      13889
 10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0
 11 Calibration_Retry_Count 0x0032  100  100  000    Old_age  Always      -      0
 12 Power_Cycle_Count      0x0032  099  099  000    Old_age  Always      -      1075
191 G-Sense_Error_Rate      0x0032  001  001  000    Old_age  Always      -      413
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      121
193 Load_Cycle_Count        0x0032  001  001  000    Old_age  Always      -      2928272
194 Temperature_Celsius    0x0022  112  102  000    Old_age  Always      -      35
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      1
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  100  253  000    Old_age  Offline      -      0

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 13880 hours (578 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 83 f4 2c 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d4 00 83 4f c2 00 08      03:25:45.248  SMART EXECUTE OFF-LINE IMMEDIATE
  b0 d0 01 00 4f c2 00 08      03:25:45.239  SMART READ DATA
  ec 00 01 00 00 00 00 08      03:25:45.219  IDENTIFY DEVICE
  ec 00 01 00 00 00 00 08      03:25:45.214  IDENTIFY DEVICE
  b0 da 00 00 4f c2 00 08      03:24:29.721  SMART RETURN STATUS

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      90%    13885        4418880
# 2  Conveyance offline  Completed: read failure      90%    13885        4418880
# 3  Short offline      Completed: read failure      90%    13885        4418880
# 4  Conveyance captive  Completed: read failure      90%    13880        4418880
# 5  Extended offline    Completed: read failure      90%    13877        4418880

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I know how to read "SMART Attributes" section, but i have problem with understanding this:
Code:

Error 1 occurred at disk power-on lifetime: 13880 hours (578 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 83 f4 2c 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d4 00 83 4f c2 00 08      03:25:45.248  SMART EXECUTE OFF-LINE IMMEDIATE
  b0 d0 01 00 4f c2 00 08      03:25:45.239  SMART READ DATA
  ec 00 01 00 00 00 00 08      03:25:45.219  IDENTIFY DEVICE
  ec 00 01 00 00 00 00 08      03:25:45.214  IDENTIFY DEVICE
  b0 da 00 00 4f c2 00 08      03:24:29.721  SMART RETURN STATUS

What are SMART EXECUTE OFF-LINE IMMEDIATE, SMART READ DATA, IDENTIFY DEVICE and SMART RETURN STATUS means ?
This error occured while testing i presume.


And finally this:
Code:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      90%    13885        4418880
# 2  Conveyance offline  Completed: read failure      90%    13885        4418880
# 3  Short offline      Completed: read failure      90%    13885        4418880
# 4  Conveyance captive  Completed: read failure      90%    13880        4418880
# 5  Extended offline    Completed: read failure      90%    13877        4418880


Is this mean that smartctl didn't scan all disk, but only 10% ?
and value 4418880 is a bad sector ?

smallpond 05-12-2015 11:49 AM

Some drives stop the selftest on the first error. If read retries can't recover the data, the sector won't be remapped to a spare until it is written. You can try overwriting it with:

Code:

dd if=/dev/zero of=/dev/sdz count=1 seek=4418880
replacing /dev/sdz with the actual drive name. Then run the test again to see if the drive is good.

blackRonin 05-13-2015 02:28 AM

That didn't helped.
I ran long test again, and i have same output
Quote:

# 1 Extended offline Completed: read failure 90 13890 4418880
Is this disk is good for use ? (i have few sata disks with same issue)

And what means SMART EXECUTE OFF-LINE IMMEDIATE, SMART READ DATA, IDENTIFY DEVICE and SMART RETURN STATUS ?
Is this play a big role ?

metaschima 05-13-2015 09:03 AM

Code:

  4 Start_Stop_Count        0x0032  001  001  000    Old_age  Always      -      137520
191 G-Sense_Error_Rate      0x0032  001  001  000    Old_age  Always      -      413
193 Load_Cycle_Count        0x0032  001  001  000    Old_age  Always      -      2928272

These attributes suggest that the drive is old.

The failed SMART tests suggest that the drive has bad blocks. Writing to the block may fix it like said above. If it does not fix it, then I would not use the drive. Either way, the drive is likely to develop more bad blocks, so if you decide to keep using it make lots of backups and don't keep important data on it.


All times are GMT -5. The time now is 03:45 AM.