LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 07-31-2014, 06:30 AM   #1
goranbr
LQ Newbie
 
Registered: Jul 2014
Posts: 5

Rep: Reputation: Disabled
Harddisk failing - What measures to take


I just recently got this message on the root console:

Code:
!$ WARNING: Your hard drive is failing
Device: /dev/sdc [SAT], FAILED SMART self-check. BACK UP DATA NOW!
I got really worried because this is my home pc and i only take rsync backups to a NAS.
And if I have taken backups from a faulty disk to my NAS I may have overridden good files there with bad files from my faulty disk , right?

Judging from the output below can anyone tell if data has already gone missing, and I have corrupted files. How will I know which files are corrupted in that case?

OR, is this a warning that I will lose data soon? Can the disk reallocate sectors to repair itself?

I have already ordered a new disk. What I am worried about is if I have already corrupted data on my current backup. This is what I have to go on so far....

Code:
# smartctl -a /dev/sdc
=== START OF INFORMATION SECTION ===
Model Family:     Hitachi/HGST Deskstar 7K4000
Device Model:     Hitachi HDS724040ALE640
Serial Number:    PK2311PAG4P4MM
LU WWN Device Id: 5 000cca 22bc220e0
Firmware Version: MJAOA3B0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul 31 12:46:36 2014 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   137   137   054    Pre-fail  Offline      -       78
  3 Spin_Up_Time            0x0007   128   128   024    Pre-fail  Always       -       579 (Average 625)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       91
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 1712
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   112   112   020    Pre-fail  Offline      -       38
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       16801
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       91
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       787
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       787
194 Temperature_Celsius     0x0002   157   157   000    Old_age   Always       -       38 (Min/Max 23/44)
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       2897
197 Current_Pending_Sector  0x0022   001   001   000    Old_age   Always       -       3760
 
Old 07-31-2014, 06:55 AM   #2
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 2,312

Rep: Reputation: Disabled
Quote:
Originally Posted by goranbr View Post
And if I have taken backups from a faulty disk to my NAS I may have overridden good files there with bad files from my faulty disk , right?
Fortunately, you're wrong.

The drive may be faililng, but every time a bad sector is encountered, the drive will attempt to reallocate it to a spare sector. If this procedure succeeds, no data are lost. If the bad sector is in use and repeated attempts to read it fails with an ECC error, a read error will be returned to the operating system.

In other words, there's no way the drive will hand you bad data and pretend it's good. The chance of a corrupted sector randomly producing a valid ECC code is next to none.

Quote:
Originally Posted by goranbr View Post
Code:
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 1712
Code:
197 Current_Pending_Sector  0x0022   001   001   000    Old_age   Always       -       3760
1712 sectors have been successfully reallocated, and 3760 sectors are marked as bad and are awaiting reallocation. If some of those 3760 sectors are completely unreadable and contain data, you will get a read error if you try to read a file with data stored in such a sector. On the other hand, if you're able to back up your data without incident, the backup will contain only good data.

You should back up your system as soon as possible, replace the drive, and perform a full restore.
 
1 members found this post helpful.
Old 07-31-2014, 07:15 AM   #3
goranbr
LQ Newbie
 
Registered: Jul 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
You don't know how reassuring that was to hear... :-)

I will get my new drive today. But I have shut down my NAS and won't make any more backup until I have a new disks.

I think it is the summer heat that is destroying my disks. :-)

Anyway, thanks a lot for your input!
 
Old 07-31-2014, 07:28 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 13,259

Rep: Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289
In which case ... see the label "Did you find this post helpful?" - I suggest you help enhance @Ser Olmy reputation by clicking "YES"
 
Old 07-31-2014, 07:42 AM   #5
goranbr
LQ Newbie
 
Registered: Jul 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
In which case ... see the label "Did you find this post helpful?" - I suggest you help enhance @Ser Olmy reputation by clicking "YES"
Of course, thanks for the tip! :-)
 
Old 07-31-2014, 10:51 AM   #6
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,885

Rep: Reputation: 475Reputation: 475Reputation: 475Reputation: 475Reputation: 475
I haven't really considered the possibility that using rsync to make backups regularly could in fact backup corrupt data. Possible solutions are to make incremental/differential backups, or to make full backups to separate files, or to backup only after some checks are run locally to make sure you're not backing up corrupt data.
 
Old 07-31-2014, 05:24 PM   #7
goranbr
LQ Newbie
 
Registered: Jul 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by metaschima View Post
I haven't really considered the possibility that using rsync to make backups regularly could in fact backup corrupt data. Possible solutions are to make incremental/differential backups, or to make full backups to separate files, or to backup only after some checks are run locally to make sure you're not backing up corrupt data.
Well, I interpreted the reply from "Ser Olmy" as if rsync would at least report an error if can't read a file properly from the source.
And if I don't get any errors, then at least that particular backup did not destroy any data.

However, I am still unsure what happens if rsync tries to back up a corrupt file (with data on sectors not readable at the time of backup).

Does rsync have any chance of detecting this in time to refrain from overwriting the target file?
That is, when rsync asks the OS for a file that it has chosen to transfer will the OS check to see if the whole file is readable before it hands it over to rsync?
Or does the OS just hand rsync one sector at a time sequentially, and then says "Ooops, this sector was actually unreadable!"?

As for making separate backups, this is a home setup on a home budget, with 8TB of disk on my PC and 8TB on my NAS. So I have alreay stretched my budget. :-)
I could use incremental backups I guess, but it's a more complicated backup scheme for a home setting I think.
 
Old 07-31-2014, 05:46 PM   #8
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 2,005

Rep: Reputation: 826Reputation: 826Reputation: 826Reputation: 826Reputation: 826Reputation: 826Reputation: 826
rsync normally creates a temporary file at the destination and, after doing that successfully, renames it over the old version. If an error occurred, the old version should be safe.
 
Old 07-31-2014, 05:55 PM   #9
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,885

Rep: Reputation: 475Reputation: 475Reputation: 475Reputation: 475Reputation: 475
Just because a file is readable does NOT mean it is not corrupt. I've gotten corrupt files after a power outage. They were readable, but full of garbage. Not sure what is best in your particular situation, but consider methods to prevent corrupt files from overwriting good ones. For sure do NOT backup after power outages or SMART fails until you are sure the files are good. Maybe checksums can help, but user input may be needed. I think at least keeping two backups and alternating between which is overwritten is a minimal way to prevent this from happening.

Last edited by metaschima; 07-31-2014 at 05:56 PM.
 
Old 07-31-2014, 11:39 PM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 13,259

Rep: Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289Reputation: 1289
Quote:
Originally Posted by metaschima View Post
I've gotten corrupt files after a power outage. They were readable, but full of garbage.
I'd suggest that you got corrupted files after the fsck after the power outage.
This is the elephant in the room - fsck is designed to fix filesystems not necessarily the files in it.

So an earlier backup should be ok, but after a fsck on a" normal" filesystem that throws messages (like after an outage) I always toss the filesystem and restore in toto. If you were to use a filesystem that had checksumming (like btrfs) you could have reasonable confidence the data read is (always) good. I use RAID5 under btrfs so it can go find a good (internal) backup when it gets a CRC mismatch on data read.
 
Old 08-01-2014, 07:59 AM   #11
goranbr
LQ Newbie
 
Registered: Jul 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Yes, power outage is another problem which is even more disturbing....

And, whether it is SMART reporting unreadable sectors or fsck "fixing" the file system it is not exactly easy to figure out which files have been corrupted.

Is there any way to get this info in either situation that you know of?
 
Old 08-01-2014, 09:55 AM   #12
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 2,005

Rep: Reputation: 826Reputation: 826Reputation: 826Reputation: 826Reputation: 826Reputation: 826Reputation: 826
The Bad Block HOWTO shows how to identify the file (if any) associated with a detected bad block. Going through that procedure for more than a very small number of bad blocks is impractical. If your backup runs without encountering an I/O error, then it is safe to say that none of the files included in the backup are using any of the bad blocks.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What are some other simple security measures ? M$ISBS Linux - Security 4 09-07-2009 06:54 PM
addtl security measures slug420 Linux - Security 1 06-10-2005 06:45 PM
Low Harddisk space. Can I just transfer the entire Linux to a bigger harddisk? davidas Linux - Newbie 12 04-13-2004 02:03 AM
Measures for accident!? Rex_chaos Linux - General 1 09-25-2001 12:12 PM


All times are GMT -5. The time now is 01:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration