LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 01-30-2013, 06:09 PM   #1
wraeth
LQ Newbie
 
Registered: Jan 2013
Location: Newcastle, New South Wales, Australia
Distribution: Fedora
Posts: 2

Rep: Reputation: Disabled
SMART Failure - Reallocated_Sector_Ct - LBA 0


Greetings;

I am encountering the problem where, on boot, I am receiving an 'immanent failure' from my HDD. All other usage of the disk seems fine.

After doing some research, I figured out that the issue is the 'Reallocated_Sector_Ct' problem, and found this thread seemed to have the answer. Unfortunately, if I understand hard drives correctly, it's my MBR that's broken and I shouldn't be able to type this:

Code:
# smartctl -t long /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 89 minutes for test to complete.
Test will complete after Thu Jan 31 12:34:49 2013

Use smartctl -X to abort test.


# smartctl -l selftest /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    90%      5419         0
# 2  Extended offline    Completed: unknown failure    90%      5419         0
# 3  Extended offline    Completed: unknown failure    90%      5419         0
# 4  Extended offline    Completed: unknown failure    90%      5419         0
# 5  Extended offline    Completed: unknown failure    90%      5244         0
The instructions in the above linked post say to run a test, check the LBA it failed at, then use 'dd' to overwrite the sector, repeating as necessary until the sectors are cleared, however I feel I shouldn't do that as it means I'll break my HDD completely...

Any suggestions or insights?

Cheers.
 
Old 01-31-2013, 03:53 AM   #2
BoraxMan
Member
 
Registered: Apr 2010
Posts: 103

Rep: Reputation: 11
I had a hard drive that was report errors, and used this program to fix it.

http://hddguru.com/software/HDD-LLF-...l-Format-Tool/

It is a low level formatter, and it is a windows program. It's not strictly low level, but works at a level lower than 'dd' does. It seemed to fix the errors, as they were 'logical' bad sectors instead of 'physical' bad sectors.

This will erase data though, and your partition tables and MBR.


The suggestion of rewriting a sector again and again won't kill your drive, unless there is mechanical issue. If its your MBR that is bad, then you will lose your MBR, and your ability to boot from the drive.
 
Old 01-31-2013, 07:49 AM   #3
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
There is no such thing as a logical bad sector. A bad sector is always physical. What this program does is initiating a low-level format of the device, which will automatically mark bad sectors as unusable in the disks firmware. This is nothing more than a workaround and in no way fixes the drive.

Rule of thumb: If SMART reports an error the first thing to do is to backup your data that isn't currently backed up.
Then download the disk manufacturer's diagnosis tool and check the disk. Most likely it will be reported as faulty, since SMART is usually correct when it comes to errors.
For me it looks like that you need a replacement for that disk.
 
Old 01-31-2013, 09:31 AM   #4
gradinaruvasile
Member
 
Registered: Apr 2010
Location: Cluj, Romania
Distribution: Debian Testing
Posts: 731

Rep: Reputation: 158Reputation: 158
Do you have any errors in dmesg? If there are bad sectors and the OS cannot read them, it logs the error in dmesg.

Example:

Code:
[187442.852700] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
[187442.852709] ata1.00: irq_stat 0x40000008
[187442.852715] ata1.00: failed command: READ FPDMA QUEUED
[187442.852726] ata1.00: cmd 60/00:08:00:8c:64/01:00:01:00:00/40 tag 1 ncq 131072 in
[187442.852726]          res 41/40:00:eb:8c:64/00:00:01:00:00/40 Emask 0x409 (media error) <F>
[187442.852732] ata1.00: status: { DRDY ERR }
[187442.852736] ata1.00: error: { UNC }
[187442.854276] ata1.00: failed to get Identify Device Data, Emask 0x1
[187442.855868] ata1.00: failed to get Identify Device Data, Emask 0x1
[187442.855881] ata1.00: configured for UDMA/133
[187442.855937] sd 0:0:0:0: [sda] Unhandled sense code
[187442.855942] sd 0:0:0:0: [sda]  
[187442.855945] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[187442.855949] sd 0:0:0:0: [sda]  
[187442.855952] Sense Key : Medium Error [current] [descriptor]
[187442.855957] Descriptor sense data with sense descriptors (in hex):
[187442.855959]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[187442.855971]         01 64 8c eb 
[187442.855978] sd 0:0:0:0: [sda]  
[187442.855983] Add. Sense: Unrecovered read error - auto reallocate failed
[187442.855987] sd 0:0:0:0: [sda] CDB: 
[187442.855989] Read(10): 28 00 01 64 8c 00 00 01 00 00
[187442.856000] end_request: I/O error, dev sda, sector 23366891
[187442.856057] ata1: EH complete
 
Old 01-31-2013, 10:21 AM   #5
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Post the output of 'smartctl -A /dev/sda'.

However, the long test failed so this means that the drive is failing, so backup all your data if you haven't already done so.
 
Old 01-31-2013, 04:10 PM   #6
wraeth
LQ Newbie
 
Registered: Jan 2013
Location: Newcastle, New South Wales, Australia
Distribution: Fedora
Posts: 2

Original Poster
Rep: Reputation: Disabled
Thanks for the replies

I'm not using this machine for anything particularly critical, so data loss isn't too much of a contributing factor; but if I can avoid a low-level solution, I'll try. I know this is an old machine (It's an Aspire 3500, complete with 'Designed for Windows XP' sticker), so it's quite possible that the drive is just old; but what I want to do is try and clear these bad sectors and monitor if and how quickly more bad sectors appear before consigning to a new HDD.

The SMART errors:
Code:
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   253   253   025    Pre-fail  Always       -       3072
  4 Start_Stop_Count        0x0032   001   001   000    Old_age   Always       -       987627
  5 Reallocated_Sector_Ct   0x0033   001   001   010    Pre-fail  Always   FAILING_NOW 1027
  7 Seek_Error_Rate         0x000e   253   253   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   253   253   000    Old_age   Offline      -       0
  9 Power_On_Half_Minutes   0x0032   100   100   000    Old_age   Always       -       5423h+20m
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       43
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       741
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1108
187 Reported_Uncorrect      0x0032   097   097   000    Old_age   Always       -       2088
188 Command_Timeout         0x0032   253   253   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   047   040    Old_age   Always       -       31 (Min/Max 7/53)
191 G-Sense_Error_Rate      0x0012   046   046   000    Old_age   Always       -       554012
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       194
193 Load_Cycle_Count        0x0012   001   001   000    Old_age   Always       -       1067508
194 Temperature_Celsius     0x0022   069   047   000    Old_age   Always       -       31 (Min/Max 7/53)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       153545
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       1027
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   253   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x0012   253   253   000    Old_age   Always       -       0
223 Load_Retry_Count        0x0012   100   100   000    Old_age   Always       -       741
225 Load_Cycle_Count        0x0012   001   001   000    Old_age   Always       -       1067508
255 Unknown_Attribute       0x000a   253   100   000    Old_age   Always       -       0
and dmesg:
Code:
# dmesg | grep error
# dmesg | grep failed
[    0.070659]  pci0000:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[    1.590318] ondemand governor failed, too long transition latency of HW, fallback to performance governor
# dmesg | grep reallocate
#
The thing that puzzles me is that, according to the test results, the tests are failing on or before the first sector of the disk, but I'm concerned about trying to write to sector 0 - I don't know enough about disk layouts or SMART to determine if it's a good idea or if the test is trying to show me something else.

Again, thanks for the replies.
 
Old 01-31-2013, 05:50 PM   #7
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
The drive is near death in many ways. It has used almost all of its spare sectors
Quote:
Originally Posted by wraeth View Post
Code:
  5 Reallocated_Sector_Ct   0x0033   001   001   010    Pre-fail  Always   FAILING_NOW 1027
still has 2 bad sectors pending reallocation
Code:
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
and has been through nearly a million start/stop cycles
Code:
    4 Start_Stop_Count        0x0032   001   001   000    Old_age   Always       -       987627
Forget about using it for anything but a paperweight.

The reason for the anomalous test result is probably that the test is working with blocks larger than a single sector and is indicating a failure somewhere in the first such block. If you really want to know which sector is bad, you can use hdparm with the "--read-sector" option to read single sectors without confusion from the OS readahead:
Code:
for N in {1..1000}; do
    hdparm --read-sector $N /dev/sda >/dev/null || break
done
echo "Stopped after $N sectors"
 
1 members found this post helpful.
Old 02-01-2013, 04:37 AM   #8
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
I agree with rknichols, the drive will fail imminently. Backup all your stuff now if you have anything important.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Question about SMART analysis "pre-failure" attribute AlexOnVinyl Linux - General 3 08-28-2011 11:11 AM
samsung reallocated_sector_ct bolovan Linux - Hardware 1 01-05-2011 11:28 AM
Smart problem adding new channel in Smart-beta google01103 SUSE / openSUSE 1 11-21-2006 07:17 AM
Ndiswrapper install failure - LBA-Linux "make" command fails petteril Linux - Wireless Networking 1 04-23-2005 08:22 AM
SMART Failure Predicted NewbGhostShells Linux - Hardware 6 08-20-2004 05:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 09:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration