Harddisk failing - What measures to take
I just recently got this message on the root console:
Code:
!$ WARNING: Your hard drive is failing And if I have taken backups from a faulty disk to my NAS I may have overridden good files there with bad files from my faulty disk , right? Judging from the output below can anyone tell if data has already gone missing, and I have corrupted files. How will I know which files are corrupted in that case? OR, is this a warning that I will lose data soon? Can the disk reallocate sectors to repair itself? I have already ordered a new disk. What I am worried about is if I have already corrupted data on my current backup. This is what I have to go on so far.... Code:
# smartctl -a /dev/sdc |
Quote:
The drive may be faililng, but every time a bad sector is encountered, the drive will attempt to reallocate it to a spare sector. If this procedure succeeds, no data are lost. If the bad sector is in use and repeated attempts to read it fails with an ECC error, a read error will be returned to the operating system. In other words, there's no way the drive will hand you bad data and pretend it's good. The chance of a corrupted sector randomly producing a valid ECC code is next to none. Quote:
You should back up your system as soon as possible, replace the drive, and perform a full restore. |
You don't know how reassuring that was to hear... :-)
I will get my new drive today. But I have shut down my NAS and won't make any more backup until I have a new disks. I think it is the summer heat that is destroying my disks. :-) Anyway, thanks a lot for your input! |
In which case ... see the label "Did you find this post helpful?" - I suggest you help enhance @Ser Olmy reputation by clicking "YES"
|
Quote:
|
I haven't really considered the possibility that using rsync to make backups regularly could in fact backup corrupt data. Possible solutions are to make incremental/differential backups, or to make full backups to separate files, or to backup only after some checks are run locally to make sure you're not backing up corrupt data.
|
Quote:
And if I don't get any errors, then at least that particular backup did not destroy any data. However, I am still unsure what happens if rsync tries to back up a corrupt file (with data on sectors not readable at the time of backup). Does rsync have any chance of detecting this in time to refrain from overwriting the target file? That is, when rsync asks the OS for a file that it has chosen to transfer will the OS check to see if the whole file is readable before it hands it over to rsync? Or does the OS just hand rsync one sector at a time sequentially, and then says "Ooops, this sector was actually unreadable!"? As for making separate backups, this is a home setup on a home budget, with 8TB of disk on my PC and 8TB on my NAS. So I have alreay stretched my budget. :-) I could use incremental backups I guess, but it's a more complicated backup scheme for a home setting I think. |
rsync normally creates a temporary file at the destination and, after doing that successfully, renames it over the old version. If an error occurred, the old version should be safe.
|
Just because a file is readable does NOT mean it is not corrupt. I've gotten corrupt files after a power outage. They were readable, but full of garbage. Not sure what is best in your particular situation, but consider methods to prevent corrupt files from overwriting good ones. For sure do NOT backup after power outages or SMART fails until you are sure the files are good. Maybe checksums can help, but user input may be needed. I think at least keeping two backups and alternating between which is overwritten is a minimal way to prevent this from happening.
|
Quote:
This is the elephant in the room - fsck is designed to fix filesystems not necessarily the files in it. So an earlier backup should be ok, but after a fsck on a" normal" filesystem that throws messages (like after an outage) I always toss the filesystem and restore in toto. If you were to use a filesystem that had checksumming (like btrfs) you could have reasonable confidence the data read is (always) good. I use RAID5 under btrfs so it can go find a good (internal) backup when it gets a CRC mismatch on data read. |
Yes, power outage is another problem which is even more disturbing....
And, whether it is SMART reporting unreadable sectors or fsck "fixing" the file system it is not exactly easy to figure out which files have been corrupted. Is there any way to get this info in either situation that you know of? |
The Bad Block HOWTO shows how to identify the file (if any) associated with a detected bad block. Going through that procedure for more than a very small number of bad blocks is impractical. If your backup runs without encountering an I/O error, then it is safe to say that none of the files included in the backup are using any of the bad blocks.
|
All times are GMT -5. The time now is 05:46 PM. |