Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I just recently got this message on the root console:
Code:
!$ WARNING: Your hard drive is failing
Device: /dev/sdc [SAT], FAILED SMART self-check. BACK UP DATA NOW!
I got really worried because this is my home pc and i only take rsync backups to a NAS.
And if I have taken backups from a faulty disk to my NAS I may have overridden good files there with bad files from my faulty disk , right?
Judging from the output below can anyone tell if data has already gone missing, and I have corrupted files. How will I know which files are corrupted in that case?
OR, is this a warning that I will lose data soon? Can the disk reallocate sectors to repair itself?
I have already ordered a new disk. What I am worried about is if I have already corrupted data on my current backup. This is what I have to go on so far....
Code:
# smartctl -a /dev/sdc
=== START OF INFORMATION SECTION ===
Model Family: Hitachi/HGST Deskstar 7K4000
Device Model: Hitachi HDS724040ALE640
Serial Number: PK2311PAG4P4MM
LU WWN Device Id: 5 000cca 22bc220e0
Firmware Version: MJAOA3B0
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jul 31 12:46:36 2014 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail Offline - 78
3 Spin_Up_Time 0x0007 128 128 024 Pre-fail Always - 579 (Average 625)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 91
5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail Always FAILING_NOW 1712
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 38
9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 16801
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 91
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 787
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 787
194 Temperature_Celsius 0x0002 157 157 000 Old_age Always - 38 (Min/Max 23/44)
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 2897
197 Current_Pending_Sector 0x0022 001 001 000 Old_age Always - 3760
And if I have taken backups from a faulty disk to my NAS I may have overridden good files there with bad files from my faulty disk , right?
Fortunately, you're wrong.
The drive may be faililng, but every time a bad sector is encountered, the drive will attempt to reallocate it to a spare sector. If this procedure succeeds, no data are lost. If the bad sector is in use and repeated attempts to read it fails with an ECC error, a read error will be returned to the operating system.
In other words, there's no way the drive will hand you bad data and pretend it's good. The chance of a corrupted sector randomly producing a valid ECC code is next to none.
1712 sectors have been successfully reallocated, and 3760 sectors are marked as bad and are awaiting reallocation. If some of those 3760 sectors are completely unreadable and contain data, you will get a read error if you try to read a file with data stored in such a sector. On the other hand, if you're able to back up your data without incident, the backup will contain only good data.
You should back up your system as soon as possible, replace the drive, and perform a full restore.
I haven't really considered the possibility that using rsync to make backups regularly could in fact backup corrupt data. Possible solutions are to make incremental/differential backups, or to make full backups to separate files, or to backup only after some checks are run locally to make sure you're not backing up corrupt data.
I haven't really considered the possibility that using rsync to make backups regularly could in fact backup corrupt data. Possible solutions are to make incremental/differential backups, or to make full backups to separate files, or to backup only after some checks are run locally to make sure you're not backing up corrupt data.
Well, I interpreted the reply from "Ser Olmy" as if rsync would at least report an error if can't read a file properly from the source.
And if I don't get any errors, then at least that particular backup did not destroy any data.
However, I am still unsure what happens if rsync tries to back up a corrupt file (with data on sectors not readable at the time of backup).
Does rsync have any chance of detecting this in time to refrain from overwriting the target file?
That is, when rsync asks the OS for a file that it has chosen to transfer will the OS check to see if the whole file is readable before it hands it over to rsync?
Or does the OS just hand rsync one sector at a time sequentially, and then says "Ooops, this sector was actually unreadable!"?
As for making separate backups, this is a home setup on a home budget, with 8TB of disk on my PC and 8TB on my NAS. So I have alreay stretched my budget. :-)
I could use incremental backups I guess, but it's a more complicated backup scheme for a home setting I think.
rsync normally creates a temporary file at the destination and, after doing that successfully, renames it over the old version. If an error occurred, the old version should be safe.
Just because a file is readable does NOT mean it is not corrupt. I've gotten corrupt files after a power outage. They were readable, but full of garbage. Not sure what is best in your particular situation, but consider methods to prevent corrupt files from overwriting good ones. For sure do NOT backup after power outages or SMART fails until you are sure the files are good. Maybe checksums can help, but user input may be needed. I think at least keeping two backups and alternating between which is overwritten is a minimal way to prevent this from happening.
Last edited by metaschima; 07-31-2014 at 05:56 PM.
I've gotten corrupt files after a power outage. They were readable, but full of garbage.
I'd suggest that you got corrupted files after the fsck after the power outage.
This is the elephant in the room - fsck is designed to fix filesystems not necessarily the files in it.
So an earlier backup should be ok, but after a fsck on a" normal" filesystem that throws messages (like after an outage) I always toss the filesystem and restore in toto. If you were to use a filesystem that had checksumming (like btrfs) you could have reasonable confidence the data read is (always) good. I use RAID5 under btrfs so it can go find a good (internal) backup when it gets a CRC mismatch on data read.
Yes, power outage is another problem which is even more disturbing....
And, whether it is SMART reporting unreadable sectors or fsck "fixing" the file system it is not exactly easy to figure out which files have been corrupted.
Is there any way to get this info in either situation that you know of?
The Bad Block HOWTO shows how to identify the file (if any) associated with a detected bad block. Going through that procedure for more than a very small number of bad blocks is impractical. If your backup runs without encountering an I/O error, then it is safe to say that none of the files included in the backup are using any of the bad blocks.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.