LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 12-15-2014, 02:52 PM   #1
ballsystemlord
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 214

Rep: Reputation: Disabled
HD SMART warning of failure


Hello, I decided to run the long set of SMART tests on my HD using smartctl and I got a line of dubious output.

Num Test_Description Status Remaining LifeTime(hours)
# 1 Extended offline Completed: read failure 50% 7770
LBA_of_first_error
2043198420

I'm not an expert so I don't know if this is a warning of failure or not.
It also seems, from the message, that the tests did not complete, is this the case?
 
Old 12-15-2014, 03:55 PM   #2
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
This means there are bad blocks, but do run 'smartctl -a /dev/sda' and post the output. The test did complete, if it didn't it would have said user terminated, but it clearly says that it completed with read failure (bad blocks).
 
Old 12-15-2014, 04:54 PM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
The test "completed" because it stops on the first failure. There might or might not be more bad blocks on the drive.
 
Old 12-16-2014, 02:15 PM   #4
ballsystemlord
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 214

Original Poster
Rep: Reputation: Disabled
Here you go, I'ts a little big.
Code:
% sudo smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.6-4-default] (SUSE RPM)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K3000
Device Model:     Hitachi HDS723020BLA642
Serial Number:    MN1220F30Y2G0D
LU WWN Device Id: 5 000cca 369cd37f7
Firmware Version: MN6OA580
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Dec 15 13:16:14 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (19092) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 319) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   097   097   016    Pre-fail  Always       -       262146
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       86
  3 Spin_Up_Time            0x0007   141   141   024    Pre-fail  Always       -       434 (Average 378)
  4 Start_Stop_Count        0x0012   098   098   000    Old_age   Always       -       11244
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       13
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   130   130   020    Pre-fail  Offline      -       28
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       7869
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       5031
192 Power-Off_Retract_Count 0x0032   091   091   000    Old_age   Always       -       11245
193 Load_Cycle_Count        0x0012   091   091   000    Old_age   Always       -       11245
194 Temperature_Celsius     0x0002   176   176   000    Old_age   Always       -       34 (Min/Max 18/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       13
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 174 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 174 occurred at disk power-on lifetime: 7844 hours (326 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 02 06 f9 c8 09  Error: UNC at LBA = 0x09c8f906 = 164165894

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 00 f9 c8 40 08      05:34:12.842  READ FPDMA QUEUED
  60 08 00 f8 f8 c8 40 08      05:34:12.842  READ FPDMA QUEUED
  60 08 00 f0 f8 c8 40 08      05:34:12.842  READ FPDMA QUEUED
  60 08 00 e8 f8 c8 40 08      05:34:12.842  READ FPDMA QUEUED
  60 08 00 e0 f8 c8 40 08      05:34:12.842  READ FPDMA QUEUED

Error 173 occurred at disk power-on lifetime: 7844 hours (326 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 02 06 f9 c8 09  Error: UNC at LBA = 0x09c8f906 = 164165894

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 08 f6 c8 40 08      05:34:09.494  READ FPDMA QUEUED
  60 00 00 08 f4 c8 40 08      05:34:09.491  READ FPDMA QUEUED
  60 00 08 08 f3 c8 40 08      05:34:09.490  READ FPDMA QUEUED
  60 80 00 88 f2 c8 40 08      05:34:09.490  READ FPDMA QUEUED
  60 20 00 68 f2 c8 40 08      05:34:09.476  READ FPDMA QUEUED

Error 172 occurred at disk power-on lifetime: 7844 hours (326 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 07 69 f0 c8 09  Error: UNC at LBA = 0x09c8f069 = 164163689

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 68 f0 c8 40 08      05:34:06.024  READ FPDMA QUEUED
  60 08 00 60 f0 c8 40 08      05:34:06.024  READ FPDMA QUEUED
  ea 00 00 00 00 00 a0 08      05:34:06.010  FLUSH CACHE EXT
  60 08 08 58 f0 c8 40 08      05:34:05.991  READ FPDMA QUEUED
  61 08 00 e1 98 70 40 08      05:34:05.991  WRITE FPDMA QUEUED

Error 171 occurred at disk power-on lifetime: 7844 hours (326 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 b7 69 f0 c8 09  Error: UNC at LBA = 0x09c8f069 = 164163689

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 c8 00 58 f0 c8 40 08      05:34:02.656  READ FPDMA QUEUED
  60 00 08 58 ef c8 40 08      05:34:02.655  READ FPDMA QUEUED
  60 80 00 d8 ee c8 40 08      05:34:02.655  READ FPDMA QUEUED
  60 20 00 b8 ee c8 40 08      05:34:02.642  READ FPDMA QUEUED
  60 38 00 48 2d 85 40 08      05:34:02.624  READ FPDMA QUEUED

Error 170 occurred at disk power-on lifetime: 7844 hours (326 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 2f df c8 09  Error: UNC at LBA = 0x09c8df2f = 164159279

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 28 df c8 40 08      05:33:59.244  READ FPDMA QUEUED
  60 08 00 20 df c8 40 08      05:33:59.244  READ FPDMA QUEUED
  60 08 00 18 df c8 40 08      05:33:59.244  READ FPDMA QUEUED
  60 08 00 10 df c8 40 08      05:33:59.244  READ FPDMA QUEUED
  60 08 00 08 df c8 40 08      05:33:59.244  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       50%      7770         2043198420
# 2  Short offline       Aborted by host               90%      6599         -
# 3  Short offline       Aborted by host               90%       892         -
# 4  Extended offline    Completed without error       00%       890         -
# 5  Extended offline    Interrupted (host reset)      90%       884         -
# 6  Short offline       Completed without error       00%       883         -
# 7  Short offline       Completed: read failure       50%       732         71400
# 8  Extended offline    Aborted by host               90%       732         -
1 of 2 failed self-tests are outdated by newer successful extended offline self-test # 4

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Old 12-16-2014, 04:54 PM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
That's not too bad for a drive that gets started and stopped quite a bit (average running time ~40 minutes). You've got 13 sectors that have been reallocated to spares and 4 more that are pending reallocation (currently visible to the OS as bad sectors -- will be corrected or reallocated the next time they are written). Until that happens the long test will not run to completion without error.

The easiest way to clear out the bad sectors is just to overwrite the entire drive with zeros. Obviously that destroys the current contents entirely, so you would need a way to back up and restore anything you wanted to save. The procedure for identifying what files (if any) are affected and doing the minimum damage to your data is on the Bad block HOWTO page at the smartmontools web site. That's a discouragingly long page, but it contains several different examples for different filesystems, and you will only be concerned with one of the cases. The procedure does have to be performed separately for each bad sector, though, so you will probably need to go through it at least 4 times. (You could have more bad sectors not yet in the "pending" list because there has never been any attempt to read them.)

Whether this drive should continue to be used depends on whether new bad sectors continue to develop. You can't determine that until you discover all of the current bad sectors.

Last edited by rknichols; 12-16-2014 at 04:56 PM.
 
Old 12-16-2014, 07:13 PM   #6
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Yeah, the drive looks fine other than the bad sectors. Do keep a backup of your data as usual, and continue monitoring it. It is true that lots of bad blocks may mean the drive is failing, but I don't think this is the case here because everything else looks fine.

You could zero the drive, but with 2 TB drive that takes a long time. The drive should automatically reallocate the bad blocks.
 
Old 12-17-2014, 03:58 PM   #7
ballsystemlord
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 214

Original Poster
Rep: Reputation: Disabled
Ok, so I do nothing and hope the drive reallocates the sectors. I don't like the passive role much, is there a way I can ask the drive "Do you have lots of additional sectors or are you running out and I need to replace you"?
Also, what do you mean by zeroing the drive? I was thinking, and have the parts to, finally, impliment a raid 3 array, so I'm planning to backup my data and plug the two new drives in and set the BIOS to raid 3 (I'm assuming that the BIOS will not preserve the data). So, I'm planning on having down time and Linux re-installation time so if there's something I can do, to make matters better, please say so.
 
Old 12-17-2014, 07:51 PM   #8
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Zeroing the drive:
Code:
dd if=/dev/zero of=/dev/sdX bs=256k
Replace "X" with the appropriate drive letter, and do NOT make a mistake. The blocksize ("bs=") parameter is fairly arbitrary, but going larger than 256k or so makes little difference in speed. (It's going to take perhaps 5 or 6 hours an a 2TB drive with a direct SATA interface.)

The drive currently has plenty of spare sectors. As it uses them, you will see the number in the "VALUE" column for Reallocated_Sector_Ct decrease from 100 toward its threshold value of 5, but the drive really should be replaced long before it gets that far. Once all of the currently bad sectors have been found and reallocated, any continuing increase the the RAW_VALUE for that parameter should be taken as a sign that the drive is seriously in trouble.

Last edited by rknichols; 12-17-2014 at 08:02 PM. Reason: Add time estimate
 
Old 12-18-2014, 01:20 PM   #9
ballsystemlord
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 214

Original Poster
Rep: Reputation: Disabled
Thanks
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] postfix - warning: SASL authentication failure: No worthy mechs found matters Slackware 8 01-08-2015 05:51 PM
SMART Failure - Reallocated_Sector_Ct - LBA 0 wraeth Linux - Hardware 7 02-01-2013 04:37 AM
smart warning disk failing drmjh Linux - General 15 02-03-2008 09:35 AM
Smart Package Manager Error:Warning: You Must Fetch Channel Info Balarabay1 SUSE / openSUSE 10 09-20-2006 11:32 PM
SMART Failure Predicted NewbGhostShells Linux - Hardware 6 08-20-2004 05:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 02:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration