Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Based on the errors occurring at differing sectors and the age of the drive, I'd say it is due for replacement. If you post the output from "smartctl -A /dev/sdg" (wrapped in [CODE] ... [/CODE] tags, please, to preserve formatting), it will give a better picture of the drive's overall health.
At a minimum you will have to write zeros to the bad regions, which will cause the drive to reallocate the bad sectors to spares. If you continue using the drive, you will have to keep close watch on the bad sector counts. If bad sectors continue to develop, the drive will definitely need replacement.
Hard disks are commodity items - go get another one. You've done ok to get that long out of it. My data is more important than an old piece of hardware. You can play with working around shortcomings as suggested, but only if it doesn't compromise the data. I keep an old machine for this, never on my day-to-day system to avoid "finger-checks".
I've now managed to copy 164GB of data from the disk.
The cp command returned 115 error messages -
Quote:
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/cnd/model/129/1-.2018-12-20T13:48:02Z.diff.gz': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/cnd/model/129/2--.2018-12-20T13:48:02Z.diff.gz': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/cnd/model/129/45-.2018-12-20T13:48:02Z.diff.gz': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/cnd/model/129/46-.2018-12-20T13:48:02Z.diff.gz': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/cnd/model/129/47-.2018-12-20T13:48:02Z.diff.gz': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/index/s76/angular': Input/output error
cp: cannot access 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/index/s77/angular/3/1': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/index/s77/cnd/1': Input/output error
cp: cannot access 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/index/s36/js/13/1': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/index/s75/angular/3/1': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/index/s75/cnd': Input/output error
cp: cannot stat 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.cache/mplab_ide/dev/v4.05/var/index/s75/css': Input/output error
cp: cannot access 'slack_desktop_rdiff/home/alex/rdiff-backup-data/increments/.mozilla.20180224/firefox/tta2kvtd.default/storage/default/https+++ir.ebaystatic.com/idb/12183338011.files': Input/output error
I take two backups -
rdiff-backup to this failed drive;
rsync to another drive,encrypting as I rsync. I then backup the encrypted data to the cloud. This disk appears to be error free, though it is a similar age to the failing drive.
As the failing disk is used solely for rdiff-backup, I'm not too concerned if I've lost rdiff-backup data.
I've tried smartctl and depending on how I address the failing drive I smartctl returns this
=== START OF INFORMATION SECTION ===
Vendor: WD
Product: My Book 1110
Revision: 2003
User Capacity: 999,501,594,624 bytes [999 GB]
Logical block size: 512 bytes
Serial number: WCAV5F616488
Device type: disk
Local Time is: Fri Sep 20 07:36:16 2019 BST
SMART support is: Unavailable - device lacks SMART capability.
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green
Device Model: WDC WD10EADS-11M2B3
Serial Number: WD-WCAV5F616488
LU WWN Device Id: 5 0014ee 204d9071c
Firmware Version: 80.00A80
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Fri Sep 20 07:36:35 2019 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
669 green: /home/alex $
Don't know why if I address the drive as
/dev/sdf SMART support is: Available
/dev/bristol SMART support is Unavailable
I've convinced myself it's the same drive. Pulled the plug. The results are now
Smartctl open device: /dev/sdf failed: No such device
671 green: /home/alex $
So I proceeded to test /dev/sdf
Quote:
675 green: /home/alex $ sudo /usr/sbin/smartctl -t short /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.186] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Sep 20 07:50:53 2019
Use smartctl -X to abort test.
676 green: /home/alex $
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 20663 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
So bit confused as smartctl seems to imply that the disk is error free.
You give no clue as to what "/dev/bristol" might be. Is that some decryption mapping of the underlying /dev/sdf device? smartctl needs the raw device. It can't reach through the encryption layer to find it.
That shows 1455 bad sectors that will cause an I/O error when read. They don't show up in the test results simply because no long test has been run since the bad sectors developed. Running "smartctl -t long /dev/sdf" would almost certainly cause a failure to be logged. Rewriting those sectors would cause them to be reallocated to spare sectors, but a number that large is often a warning that the drive will continue to develop more bad sectors and could soon fail completely. Do not rely on the overall health statement from smartctl. That is generated by the firmware on the device, and bad sectors will not cause a health warning until the drive's supply of spare sectors is nearly exhausted. That is long past the point where the drive should have been replaced.
Note that an rdiff-backup archive is a fairly complex, and not terribly robust, database. Having various bit and pieces of it missing will cause history for those elements to be unavailable.
As for /dev/sdf: That shows 1455 bad sectors that will cause an I/O error when read. They don't show up in the test results simply because no long test has been run since the bad sectors developed. Running "smartctl -t long /dev/sdf" would almost certainly cause a failure to be logged. Rewriting those sectors would cause them to be reallocated to spare sectors, but a number that large is often a warning that the drive will continue to develop more bad sectors and could soon fail completely. Do not rely on the overall health statement from smartctl. That is generated by the firmware on the device, and bad sectors will not cause a health warning until the drive's supply of spare sectors is nearly exhausted. That is long past the point where the drive should have been replaced.
Not quite sure how to get the results so tried
Quote:
smartctl -l selftest /dev/sdg
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.186] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 90% 20707 -
# 2 Extended offline Interrupted (host reset) 90% 20700 -
# 3 Extended offline Interrupted (host reset) 90% 20698 -
# 4 Short offline Completed without error 00% 20663 -
Quote:
Originally Posted by rknichols
Note that an rdiff-backup archive is a fairly complex, and not terribly robust, database. Having various bit and pieces of it missing will cause history for those elements to be unavailable.
After transferring the data to a new drive and after doing a "rdiff-backup --check-destination-dir" on the new drive did get back some reasonable results back. Anyway nothing can be trusted. So will look, with urgency, to carry on as is using a couple of USB drives plugged into my main machine then consider building a new machine to run a RAID backup solution. The Raid solution would be more a long term solution and might be too complex for my needs.
... then consider building a new machine to run a RAID backup solution.
RAID is not a backup solution. It is for redundancy - say when a drive fails. If you issue "rm -rf" in a RAID environment your data is still gone.
You need a separate backup strategy. Always.
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 90% 20707 -
# 2 Extended offline Interrupted (host reset) 90% 20700 -
# 3 Extended offline Interrupted (host reset) 90% 20698 -
# 4 Short offline Completed without error 00% 20663 -
Those tests got interrupted right at the start. You have to let the test run to completion, without powering-off or rebooting.
Quote:
After transferring the data to a new drive and after doing a "rdiff-backup --check-destination-dir" on the new drive did get back some reasonable results back.
All "--check-destination-dir" does is check whether the most recent backup session failed or was interrupted before completion, and rolls back that session if that was the case.
One of the shortcomings of rdiff-backup is that it does not provide any good way to test the overall integrity of the archive. The only way to do that is to run with the "--verify-at-time" option for every session in the backup history. You can run several of those sessions in parallel in about the same time as a single session (there is a lot of commonality of disk access, so the kernel's buffer cache is a big win here), but it still takes a long time.
... then consider building a new machine to run a RAID backup solution.
RAID is not a backup solution. It is for redundancy - say when a drive fails. If you issue "rm -rf" in a RAID environment your data is still gone.
You need a separate backup strategy. Always.
My current backup strategy is
to rsync to an external drive called "southsea" encrypting as I go then transfer the files from that external drive, using s3cmd, to the Amazon cloud. I don't backup all files only those that I would miss if something went horribly wrong
to rdiff-backup to an external drive called "bristol". This I use if I've changed a file and some days later want to backout the change to a specific date. Might be a few changes back.
It was "bristol" that failed the other day so no great loss. I will commission a new drive and start the rdiff-backup from day 1. I would lose the history of any changes I've made to files, usually scripts/programs in the past, at the moment that doesn't bother me.
If "southsea" had failed I will commission a new drive and start the rsync process to repopulate the drive. If I've got the S3cmd rules set correctly it would only transfer the changed files to Amazon as most of the files will have the same date/time stamp and md5sum.
If a hard drive on my production machine(s) failed I would get the files back from either "bristol", "southsea" or Amazon whatever was the most appropriate. i.e. hard drive failure or building burning down.
I've reread the Raid documentation and now consider it not to be more appropriate for my situation.
I'm going to put "bristol" and "southsea" out to pasture both are of a similar age, install new drives and see if the strategy I've described above works.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.