Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am encountering the problem where, on boot, I am receiving an 'immanent failure' from my HDD. All other usage of the disk seems fine.
After doing some research, I figured out that the issue is the 'Reallocated_Sector_Ct' problem, and found this thread seemed to have the answer. Unfortunately, if I understand hard drives correctly, it's my MBR that's broken and I shouldn't be able to type this:
Code:
# smartctl -t long /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 89 minutes for test to complete.
Test will complete after Thu Jan 31 12:34:49 2013
Use smartctl -X to abort test.
# smartctl -l selftest /dev/sda
smartctl 6.0 2012-10-10 r3643 [i686-linux-3.7.3-101.fc17.i686] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: unknown failure 90% 5419 0
# 2 Extended offline Completed: unknown failure 90% 5419 0
# 3 Extended offline Completed: unknown failure 90% 5419 0
# 4 Extended offline Completed: unknown failure 90% 5419 0
# 5 Extended offline Completed: unknown failure 90% 5244 0
The instructions in the above linked post say to run a test, check the LBA it failed at, then use 'dd' to overwrite the sector, repeating as necessary until the sectors are cleared, however I feel I shouldn't do that as it means I'll break my HDD completely...
It is a low level formatter, and it is a windows program. It's not strictly low level, but works at a level lower than 'dd' does. It seemed to fix the errors, as they were 'logical' bad sectors instead of 'physical' bad sectors.
This will erase data though, and your partition tables and MBR.
The suggestion of rewriting a sector again and again won't kill your drive, unless there is mechanical issue. If its your MBR that is bad, then you will lose your MBR, and your ability to boot from the drive.
There is no such thing as a logical bad sector. A bad sector is always physical. What this program does is initiating a low-level format of the device, which will automatically mark bad sectors as unusable in the disks firmware. This is nothing more than a workaround and in no way fixes the drive.
Rule of thumb: If SMART reports an error the first thing to do is to backup your data that isn't currently backed up.
Then download the disk manufacturer's diagnosis tool and check the disk. Most likely it will be reported as faulty, since SMART is usually correct when it comes to errors.
For me it looks like that you need a replacement for that disk.
I'm not using this machine for anything particularly critical, so data loss isn't too much of a contributing factor; but if I can avoid a low-level solution, I'll try. I know this is an old machine (It's an Aspire 3500, complete with 'Designed for Windows XP' sticker), so it's quite possible that the drive is just old; but what I want to do is try and clear these bad sectors and monitor if and how quickly more bad sectors appear before consigning to a new HDD.
# dmesg | grep error
# dmesg | grep failed
[ 0.070659] pci0000:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[ 1.590318] ondemand governor failed, too long transition latency of HW, fallback to performance governor
# dmesg | grep reallocate
#
The thing that puzzles me is that, according to the test results, the tests are failing on or before the first sector of the disk, but I'm concerned about trying to write to sector 0 - I don't know enough about disk layouts or SMART to determine if it's a good idea or if the test is trying to show me something else.
Forget about using it for anything but a paperweight.
The reason for the anomalous test result is probably that the test is working with blocks larger than a single sector and is indicating a failure somewhere in the first such block. If you really want to know which sector is bad, you can use hdparm with the "--read-sector" option to read single sectors without confusion from the OS readahead:
Code:
for N in {1..1000}; do
hdparm --read-sector $N /dev/sda >/dev/null || break
done
echo "Stopped after $N sectors"
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.