SMART Failure - Reallocated_Sector_Ct

wraeth · 01-30-2013, 06:09 PM

Greetings;

I am encountering the problem where, on boot, I am receiving an 'immanent failure' from my HDD. All other usage of the disk seems fine.

After doing some research, I figured out that the issue is the 'Reallocated_Sector_Ct' problem, and found this thread seemed to have the answer. Unfortunately, if I understand hard drives correctly, it's my MBR that's broken and I shouldn't be able to type this:

Code:

# smartctl -t long /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 89 minutes for test to complete.
Test will complete after Thu Jan 31 12:34:49 2013

Use smartctl -X to abort test.


# smartctl -l selftest /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    90%      5419         0
# 2  Extended offline    Completed: unknown failure    90%      5419         0
# 3  Extended offline    Completed: unknown failure    90%      5419         0
# 4  Extended offline    Completed: unknown failure    90%      5419         0
# 5  Extended offline    Completed: unknown failure    90%      5244         0

The instructions in the above linked post say to run a test, check the LBA it failed at, then use 'dd' to overwrite the sector, repeating as necessary until the sectors are cleared, however I feel I shouldn't do that as it means I'll break my HDD completely...

Any suggestions or insights?

Cheers.

BoraxMan · 01-31-2013, 03:53 AM

I had a hard drive that was report errors, and used this program to fix it.

http://hddguru.com/software/HDD-LLF-...l-Format-Tool/

It is a low level formatter, and it is a windows program. It's not strictly low level, but works at a level lower than 'dd' does. It seemed to fix the errors, as they were 'logical' bad sectors instead of 'physical' bad sectors.

This will erase data though, and your partition tables and MBR.

The suggestion of rewriting a sector again and again won't kill your drive, unless there is mechanical issue. If its your MBR that is bad, then you will lose your MBR, and your ability to boot from the drive.

TobiSGD · 01-31-2013, 07:49 AM

There is no such thing as a logical bad sector. A bad sector is always physical. What this program does is initiating a low-level format of the device, which will automatically mark bad sectors as unusable in the disks firmware. This is nothing more than a workaround and in no way fixes the drive.

Rule of thumb: If SMART reports an error the first thing to do is to backup your data that isn't currently backed up.
Then download the disk manufacturer's diagnosis tool and check the disk. Most likely it will be reported as faulty, since SMART is usually correct when it comes to errors.
For me it looks like that you need a replacement for that disk.

gradinaruvasile · 01-31-2013, 09:31 AM

Do you have any errors in dmesg? If there are bad sectors and the OS cannot read them, it logs the error in dmesg.

Example:

Code:

[187442.852700] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[187442.852709] ata1.00: irq_stat 0x40000008
[187442.852715] ata1.00: failed command: READ FPDMA QUEUED
[187442.852726] ata1.00: cmd 60/00:08:00:8c:64/01:00:01:00:00/40 tag 1 ncq 131072 in
[187442.852726]          res 41/40:00:eb:8c:64/00:00:01:00:00/40 Emask 0x409 (media error) <F>
[187442.852732] ata1.00: status: { DRDY ERR }
[187442.852736] ata1.00: error: { UNC }
[187442.854276] ata1.00: failed to get Identify Device Data, Emask 0x1
[187442.855868] ata1.00: failed to get Identify Device Data, Emask 0x1
[187442.855881] ata1.00: configured for UDMA/133
[187442.855937] sd 0:0:0:0: [sda] Unhandled sense code
[187442.855942] sd 0:0:0:0: [sda]  
[187442.855945] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[187442.855949] sd 0:0:0:0: [sda]  
[187442.855952] Sense Key : Medium Error [current] [descriptor]
[187442.855957] Descriptor sense data with sense descriptors (in hex):
[187442.855959]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[187442.855971]         01 64 8c eb 
[187442.855978] sd 0:0:0:0: [sda]  
[187442.855983] Add. Sense: Unrecovered read error - auto reallocate failed
[187442.855987] sd 0:0:0:0: [sda] CDB: 
[187442.855989] Read(10): 28 00 01 64 8c 00 00 01 00 00
[187442.856000] end_request: I/O error, dev sda, sector 23366891
[187442.856057] ata1: EH complete

H_TeXMeX_H · 01-31-2013, 10:21 AM

Post the output of 'smartctl -A /dev/sda'.

However, the long test failed so this means that the drive is failing, so backup all your data if you haven't already done so.

wraeth · 01-31-2013, 04:10 PM

Thanks for the replies

I'm not using this machine for anything particularly critical, so data loss isn't too much of a contributing factor; but if I can avoid a low-level solution, I'll try. I know this is an old machine (It's an Aspire 3500, complete with 'Designed for Windows XP' sticker), so it's quite possible that the drive is just old; but what I want to do is try and clear these bad sectors and monitor if and how quickly more bad sectors appear before consigning to a new HDD.

The SMART errors:

Code:

smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   253   253   025    Pre-fail  Always       -       3072
  4 Start_Stop_Count        0x0032   001   001   000    Old_age   Always       -       987627
  5 Reallocated_Sector_Ct   0x0033   001   001   010    Pre-fail  Always   FAILING_NOW 1027
  7 Seek_Error_Rate         0x000e   253   253   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   253   253   000    Old_age   Offline      -       0
  9 Power_On_Half_Minutes   0x0032   100   100   000    Old_age   Always       -       5423h+20m
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       43
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       741
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1108
187 Reported_Uncorrect      0x0032   097   097   000    Old_age   Always       -       2088
188 Command_Timeout         0x0032   253   253   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   047   040    Old_age   Always       -       31 (Min/Max 7/53)
191 G-Sense_Error_Rate      0x0012   046   046   000    Old_age   Always       -       554012
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       194
193 Load_Cycle_Count        0x0012   001   001   000    Old_age   Always       -       1067508
194 Temperature_Celsius     0x0022   069   047   000    Old_age   Always       -       31 (Min/Max 7/53)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       153545
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       1027
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   253   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x0012   253   253   000    Old_age   Always       -       0
223 Load_Retry_Count        0x0012   100   100   000    Old_age   Always       -       741
225 Load_Cycle_Count        0x0012   001   001   000    Old_age   Always       -       1067508
255 Unknown_Attribute       0x000a   253   100   000    Old_age   Always       -       0

and dmesg:

Code:

# dmesg | grep error
# dmesg | grep failed
[    0.070659]  pci0000:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[    1.590318] ondemand governor failed, too long transition latency of HW, fallback to performance governor
# dmesg | grep reallocate
#

The thing that puzzles me is that, according to the test results, the tests are failing on or before the first sector of the disk, but I'm concerned about trying to write to sector 0 - I don't know enough about disk layouts or SMART to determine if it's a good idea or if the test is trying to show me something else.

Again, thanks for the replies.

rknichols · 01-31-2013, 05:50 PM

The drive is near death in many ways. It has used almost all of its spare sectors

Quote:

Originally Posted by wraeth

Code:

  5 Reallocated_Sector_Ct   0x0033   001   001   010    Pre-fail  Always   FAILING_NOW 1027

still has 2 bad sectors pending reallocation

Code:

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always

and has been through nearly a million start/stop cycles

Code:

    4 Start_Stop_Count        0x0032   001   001   000    Old_age   Always       -       987627

Forget about using it for anything but a paperweight.

The reason for the anomalous test result is probably that the test is working with blocks larger than a single sector and is indicating a failure somewhere in the first such block. If you really want to know which sector is bad, you can use hdparm with the "--read-sector" option to read single sectors without confusion from the OS readahead:

Code:

for N in {1..1000}; do
    hdparm --read-sector $N /dev/sda >/dev/null || break
done
echo "Stopped after $N sectors"

H_TeXMeX_H · 02-01-2013, 04:37 AM

I agree with rknichols, the drive will fail imminently. Backup all your stuff now if you have anything important.