LinuxQuestions.org

LinuxQuestions.org (http://www.linuxquestions.org/questions/index.php)
-   Linux - Hardware (http://www.linuxquestions.org/questions/forumdisplay.php?f=18)
-   -   SMART Failure - Reallocated_Sector_Ct - LBA 0 (http://www.linuxquestions.org/questions/showthread.php?t=4175447905)

wraeth 01-30-2013 06:09 PM

SMART Failure - Reallocated_Sector_Ct - LBA 0
 
Greetings;

I am encountering the problem where, on boot, I am receiving an 'immanent failure' from my HDD. All other usage of the disk seems fine.

After doing some research, I figured out that the issue is the 'Reallocated_Sector_Ct' problem, and found this thread seemed to have the answer. Unfortunately, if I understand hard drives correctly, it's my MBR that's broken and I shouldn't be able to type this:

Code:

# smartctl -t long /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 89 minutes for test to complete.
Test will complete after Thu Jan 31 12:34:49 2013

Use smartctl -X to abort test.


# smartctl -l selftest /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    90%      5419        0
# 2  Extended offline    Completed: unknown failure    90%      5419        0
# 3  Extended offline    Completed: unknown failure    90%      5419        0
# 4  Extended offline    Completed: unknown failure    90%      5419        0
# 5  Extended offline    Completed: unknown failure    90%      5244        0

The instructions in the above linked post say to run a test, check the LBA it failed at, then use 'dd' to overwrite the sector, repeating as necessary until the sectors are cleared, however I feel I shouldn't do that as it means I'll break my HDD completely...

Any suggestions or insights?

Cheers.

BoraxMan 01-31-2013 03:53 AM

I had a hard drive that was report errors, and used this program to fix it.

http://hddguru.com/software/HDD-LLF-...l-Format-Tool/

It is a low level formatter, and it is a windows program. It's not strictly low level, but works at a level lower than 'dd' does. It seemed to fix the errors, as they were 'logical' bad sectors instead of 'physical' bad sectors.

This will erase data though, and your partition tables and MBR.


The suggestion of rewriting a sector again and again won't kill your drive, unless there is mechanical issue. If its your MBR that is bad, then you will lose your MBR, and your ability to boot from the drive.

TobiSGD 01-31-2013 07:49 AM

There is no such thing as a logical bad sector. A bad sector is always physical. What this program does is initiating a low-level format of the device, which will automatically mark bad sectors as unusable in the disks firmware. This is nothing more than a workaround and in no way fixes the drive.

Rule of thumb: If SMART reports an error the first thing to do is to backup your data that isn't currently backed up.
Then download the disk manufacturer's diagnosis tool and check the disk. Most likely it will be reported as faulty, since SMART is usually correct when it comes to errors.
For me it looks like that you need a replacement for that disk.

gradinaruvasile 01-31-2013 09:31 AM

Do you have any errors in dmesg? If there are bad sectors and the OS cannot read them, it logs the error in dmesg.

Example:

Code:

[187442.852700] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[187442.852709] ata1.00: irq_stat 0x40000008
[187442.852715] ata1.00: failed command: READ FPDMA QUEUED
[187442.852726] ata1.00: cmd 60/00:08:00:8c:64/01:00:01:00:00/40 tag 1 ncq 131072 in
[187442.852726]          res 41/40:00:eb:8c:64/00:00:01:00:00/40 Emask 0x409 (media error) <F>
[187442.852732] ata1.00: status: { DRDY ERR }
[187442.852736] ata1.00: error: { UNC }
[187442.854276] ata1.00: failed to get Identify Device Data, Emask 0x1
[187442.855868] ata1.00: failed to get Identify Device Data, Emask 0x1
[187442.855881] ata1.00: configured for UDMA/133
[187442.855937] sd 0:0:0:0: [sda] Unhandled sense code
[187442.855942] sd 0:0:0:0: [sda] 
[187442.855945] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[187442.855949] sd 0:0:0:0: [sda] 
[187442.855952] Sense Key : Medium Error [current] [descriptor]
[187442.855957] Descriptor sense data with sense descriptors (in hex):
[187442.855959]        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[187442.855971]        01 64 8c eb
[187442.855978] sd 0:0:0:0: [sda] 
[187442.855983] Add. Sense: Unrecovered read error - auto reallocate failed
[187442.855987] sd 0:0:0:0: [sda] CDB:
[187442.855989] Read(10): 28 00 01 64 8c 00 00 01 00 00
[187442.856000] end_request: I/O error, dev sda, sector 23366891
[187442.856057] ata1: EH complete


H_TeXMeX_H 01-31-2013 10:21 AM

Post the output of 'smartctl -A /dev/sda'.

However, the long test failed so this means that the drive is failing, so backup all your data if you haven't already done so.

wraeth 01-31-2013 04:10 PM

Thanks for the replies :)

I'm not using this machine for anything particularly critical, so data loss isn't too much of a contributing factor; but if I can avoid a low-level solution, I'll try. I know this is an old machine (It's an Aspire 3500, complete with 'Designed for Windows XP' sticker), so it's quite possible that the drive is just old; but what I want to do is try and clear these bad sectors and monitor if and how quickly more bad sectors appear before consigning to a new HDD.

The SMART errors:
Code:

smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  100  100  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0007  253  253  025    Pre-fail  Always      -      3072
  4 Start_Stop_Count        0x0032  001  001  000    Old_age  Always      -      987627
  5 Reallocated_Sector_Ct  0x0033  001  001  010    Pre-fail  Always  FAILING_NOW 1027
  7 Seek_Error_Rate        0x000e  253  253  000    Old_age  Always      -      0
  8 Seek_Time_Performance  0x0024  253  253  000    Old_age  Offline      -      0
  9 Power_On_Half_Minutes  0x0032  100  100  000    Old_age  Always      -      5423h+20m
 10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      43
 11 Calibration_Retry_Count 0x0012  100  100  000    Old_age  Always      -      741
 12 Power_Cycle_Count      0x0032  099  099  000    Old_age  Always      -      1108
187 Reported_Uncorrect      0x0032  097  097  000    Old_age  Always      -      2088
188 Command_Timeout        0x0032  253  253  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  069  047  040    Old_age  Always      -      31 (Min/Max 7/53)
191 G-Sense_Error_Rate      0x0012  046  046  000    Old_age  Always      -      554012
192 Power-Off_Retract_Count 0x0012  100  100  000    Old_age  Always      -      194
193 Load_Cycle_Count        0x0012  001  001  000    Old_age  Always      -      1067508
194 Temperature_Celsius    0x0022  069  047  000    Old_age  Always      -      31 (Min/Max 7/53)
195 Hardware_ECC_Recovered  0x001a  100  100  000    Old_age  Always      -      153545
196 Reallocated_Event_Count 0x0032  001  001  000    Old_age  Always      -      1027
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      2
198 Offline_Uncorrectable  0x0030  253  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x000a  253  100  000    Old_age  Always      -      0
201 Soft_Read_Error_Rate    0x0012  253  253  000    Old_age  Always      -      0
223 Load_Retry_Count        0x0012  100  100  000    Old_age  Always      -      741
225 Load_Cycle_Count        0x0012  001  001  000    Old_age  Always      -      1067508
255 Unknown_Attribute      0x000a  253  100  000    Old_age  Always      -      0

and dmesg:
Code:

# dmesg | grep error
# dmesg | grep failed
[    0.070659]  pci0000:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[    1.590318] ondemand governor failed, too long transition latency of HW, fallback to performance governor
# dmesg | grep reallocate
#

The thing that puzzles me is that, according to the test results, the tests are failing on or before the first sector of the disk, but I'm concerned about trying to write to sector 0 - I don't know enough about disk layouts or SMART to determine if it's a good idea or if the test is trying to show me something else.

Again, thanks for the replies.

rknichols 01-31-2013 05:50 PM

The drive is near death in many ways. It has used almost all of its spare sectors
Quote:

Originally Posted by wraeth (Post 4881668)
Code:

  5 Reallocated_Sector_Ct  0x0033  001  001  010    Pre-fail  Always  FAILING_NOW 1027

still has 2 bad sectors pending reallocation
Code:

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always
and has been through nearly a million start/stop cycles
Code:

    4 Start_Stop_Count        0x0032  001  001  000    Old_age  Always      -      987627
Forget about using it for anything but a paperweight.

The reason for the anomalous test result is probably that the test is working with blocks larger than a single sector and is indicating a failure somewhere in the first such block. If you really want to know which sector is bad, you can use hdparm with the "--read-sector" option to read single sectors without confusion from the OS readahead:
Code:

for N in {1..1000}; do
    hdparm --read-sector $N /dev/sda >/dev/null || break
done
echo "Stopped after $N sectors"


H_TeXMeX_H 02-01-2013 04:37 AM

I agree with rknichols, the drive will fail imminently. Backup all your stuff now if you have anything important.


All times are GMT -5. The time now is 05:03 AM.