HDD Partition Failed Boot / ddrescue Operation Completed / Now What to Try and Restore Failing HDD/Partition?

GNewbie · 10-13-2016, 02:00 PM

Hi All,

My HDD failed to boot on day last week. It either happened out of the blue or it could have happened after the computer was frozen and the power button was held down to turn off the power (can't recall for sure).

When I ran gparted, it flagged my linux swap partition and my home drive (I'm not familiar with the full meaning of the orangish "flag").

This is the ending portion of the error I found when I tried to boot the partition.

Code:

[ 18.818---] ata3.00: statua: { DRDY ERR }
[ 18.818---] ata3.00: error { UNC }
[ 18.818---] ata3.00: configured for UDMA/133
[ 18.818---] sd 2:0:0:0: [sda] unhandled sense code
[ 18.818---] sd 2:0:0:0: [sda]
[ 18.818---] Result" hhostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 18.818---] sd 2:0:0:0: [sda]
[ 18.818---] Add. Sense: Unrecovered rread error - auto reallocate failed
[ 18.818---] sd 2:0:0:0: [sda] CDB:
[ 18.818---] Read(10): 28 00 04 99 39 e8 00 00 08 00
[ 18.818---] end_request: I/O error, dev sda, sector 77150696
[ 18.818---] Buffer I/O error on device sda7, logical block 61
[ 18.819---] ata3: EH complete

BusyBopx v1.20.2 (Ubuntu 1:1.20.0-8.1ubuntu1) built-in shell (asdh)
Enter 'help' for a list of built-in commands

(initramfs)

This is the output of lsblk -f -o +size. sda6 is the linux swap and sda7 is the home partition.

Code:

root@sysresccd /root % lsblk -f -o +size
NAME  FSTYPE  LABEL      UUID                                MOUNTPOINT  SIZE
sda                                                                      931.5G
├─sda1 ext4              a1e05889-c8a7-4373-b54a-0a0dfd8a71e2              476M
├─sda2                                                                        1K
├─sda5 ext4              100f8716-048c-460e-abc9-913fa2d09c6c              14G
├─sda6                                                                    22.4G
└─sda7                                                                    838.2G

This is the output of dmesg |tail

Code:

mint@mint ~ $ sudo dmesg |tail
[ 2808.481008] nouveau 0000:01:00.0: gr: 	00409610: f7f70000
[ 2808.481013] nouveau 0000:01:00.0: gr: TRAP_TEXTURE - TP2: 00000003 [ FAULT]
[ 2808.481017] nouveau 0000:01:00.0: gr: magic set 3:
[ 2808.481021] nouveau 0000:01:00.0: gr: 	00409e04: dc0a6201
[ 2808.481025] nouveau 0000:01:00.0: gr: 	00409e08: f700f7f7
[ 2808.481029] nouveau 0000:01:00.0: gr: 	00409e0c: 40000430
[ 2808.481033] nouveau 0000:01:00.0: gr: 	00409e10: f7f70000
[ 2808.481038] nouveau 0000:01:00.0: gr: TRAP_TEXTURE - TP3: 00000003 [ FAULT]
[ 2808.481044] nouveau 0000:01:00.0: gr: 00200000 [] ch 6 [001f8f9000 cinnamon[2913]] subc 3 class 8597 mthd 1b0c data 1000f010
[ 2808.481060] nouveau 0000:01:00.0: fb: trapped read at f700f7f700 on channel 6 [1f8f9000 cinnamon[2913]] engine 00 [PGRAPH] client 0a [TEXTURE] subclient 00 [] reason 00000000 [PT_NOT_PRESENT]
mint@mint ~ $

This is the output of smartctl -x /dev/sda.

[code]
root@sysresccd /root % smartctl -x /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.33-std483-amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue
Device Model: WDC WD10EZEX-00RKKA0
Serial Number: WD-WCC1S4067136
LU WWN Device Id: 5 0014ee 2088d5aeb
Firmware Version: 80.00A80
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Oct 10 15:38:16 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (11040) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 127) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x30b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 2010
3 Spin_Up_Time POS--K 173 172 021 - 2316
4 Start_Stop_Count -O--CK 100 100 000 - 841
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 086 086 000 - 10279
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 840
192 Power-Off_Retract_Count -O--CK 200 200 000 - 127
193 Load_Cycle_Count -O--CK 200 200 000 - 713
194 Temperature_Celsius -O---K 115 100 000 - 28
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 192 192 000 - 1373
198 Offline_Uncorrectable ----CK 192 192 000 - 1370
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 196 196 000 - 1684
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb5 GPL,SL VS 1 Device vendor specific log
0xb6 GPL VS 1 Device vendor specific log
0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1899 (device log contains only the most recent 24 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1899 [2] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 38 78 40 00 Error: UNC at LBA = 0x04993878 = 77150328

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 b8 00 00 04 99 38 78 40 08 00:07:01.754 READ FPDMA QUEUED
ef 00 10 00 02 00 00 00 00 00 00 a0 08 00:07:01.753 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 00 00 00 00 00 e0 08 00:07:01.753 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 00 00 00 00 00 a0 08 00:07:01.753 IDENTIFY DEVICE
ef 00 03 00 46 00 00 00 00 00 00 a0 08 00:07:01.752 SET FEATURES [Set transfer mode]

Error 1898 [1] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 38 78 40 00 Error: UNC at LBA = 0x04993878 = 77150328

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 98 00 00 04 99 38 78 40 08 00:06:59.727 READ FPDMA QUEUED
60 00 08 00 90 00 00 04 99 38 38 40 08 00:06:59.727 READ FPDMA QUEUED
60 00 08 00 88 00 00 04 99 38 18 40 08 00:06:59.727 READ FPDMA QUEUED
60 00 08 00 80 00 00 04 99 40 00 40 08 00:06:59.701 READ FPDMA QUEUED
60 00 08 00 78 00 00 6d 5f 46 f8 40 08 00:06:59.674 READ FPDMA QUEUED

Error 1897 [0] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 38 78 40 00 Error: UNC at LBA = 0x04993878 = 77150328

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 b0 00 00 00 0e f0 38 40 08 00:01:50.657 READ FPDMA QUEUED
60 00 08 00 98 00 00 00 0e f0 18 40 08 00:01:50.636 READ FPDMA QUEUED
60 00 08 00 90 00 00 00 00 18 00 40 08 00:01:50.625 READ FPDMA QUEUED
60 00 08 00 88 00 00 00 0e f8 00 40 08 00:01:50.622 READ FPDMA QUEUED
60 00 08 00 40 00 00 04 99 38 78 40 08 00:01:50.614 READ FPDMA QUEUED

Error 1896 [23] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 38 78 40 00 Error: UNC at LBA = 0x04993878 = 77150328

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 00 00 00 04 99 38 78 40 08 00:01:48.588 READ FPDMA QUEUED
60 00 08 00 f0 00 00 04 99 38 38 40 08 00:01:48.587 READ FPDMA QUEUED
60 00 08 00 e8 00 00 04 99 38 18 40 08 00:01:48.587 READ FPDMA QUEUED
60 00 08 00 e0 00 00 04 99 40 00 40 08 00:01:48.562 READ FPDMA QUEUED
60 00 08 00 d8 00 00 6d 5f 46 f8 40 08 00:01:48.526 READ FPDMA QUEUED

Error 1895 [22] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 38 78 40 00 Error: UNC at LBA = 0x04993878 = 77150328

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 10 00 00 04 99 38 78 40 08 00:01:42.483 READ FPDMA QUEUED
60 00 08 00 f0 00 00 00 0f 00 00 40 08 00:01:42.464 READ FPDMA QUEUED
60 00 68 00 e8 00 00 01 cd ef 88 40 08 00:01:42.462 READ FPDMA QUEUED
60 00 80 00 e0 00 00 01 cd ef 00 40 08 00:01:42.462 READ FPDMA QUEUED
60 00 f8 00 d8 00 00 01 cd ee 00 40 08 00:01:42.462 READ FPDMA QUEUED

Error 1894 [21] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 38 78 40 00 Error: UNC at LBA = 0x04993878 = 77150328

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 38 00 00 04 99 38 78 40 08 00:01:40.404 READ FPDMA QUEUED
60 00 08 00 30 00 00 04 99 38 38 40 08 00:01:40.404 READ FPDMA QUEUED
60 00 08 00 28 00 00 04 99 38 18 40 08 00:01:40.404 READ FPDMA QUEUED
60 00 08 00 20 00 00 04 99 40 00 40 08 00:01:40.379 READ FPDMA QUEUED
60 00 08 00 18 00 00 6d 5f 46 f8 40 08 00:01:40.356 READ FPDMA QUEUED

Error 1893 [20] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 3b 08 40 00 Error: UNC at LBA = 0x04993b08 = 77150984

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 48 00 00 04 99 3b 08 40 08 00:01:08.857 READ FPDMA QUEUED
ef 00 10 00 02 00 00 00 00 00 00 a0 08 00:01:08.857 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 00 00 00 00 00 e0 08 00:01:08.856 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 00 00 00 00 00 a0 08 00:01:08.856 IDENTIFY DEVICE
ef 00 03 00 46 00 00 00 00 00 00 a0 08 00:01:08.856 SET FEATURES [Set transfer mode]

Error 1892 [19] occurred at disk power-on lifetime: 10279 hours (428 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 04 99 3b 08 40 00 Error: UNC at LBA = 0x04993b08 = 77150984

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 20 00 00 04 99 3b 08 40 08 00:01:06.827 READ FPDMA QUEUED
60 00 08 00 18 00 00 04 99 3b 00 40 08 00:01:06.826 READ FPDMA QUEUED
60 00 08 00 10 00 00 04 99 39 10 40 08 00:01:06.826 READ FPDMA QUEUED
60 00 08 00 08 00 00 04 99 39 08 40 08 00:01:06.826 READ FPDMA QUEUED
60 00 08 00 00 00 00 04 99 39 00 40 08 00:01:06.808 READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 27 Celsius
Power Cycle Min/Max Temperature: 23/27 Celsius
Lifetime Min/Max Temperature: 23/43 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (303)

Index Estimated Time Temperature Celsius
304 2016-10-10 07:41 35 ****************
... ..( 4 skipped). .. ****************
309 2016-10-10 07:46 35 ****************
310 2016-10-10 07:47 36 *****************
... ..(172 skipped). .. *****************
5 2016-10-10 10:40 36 *****************
6 2016-10-10 10:41 37 ******************
... ..( 7 skipped). .. ******************
14 2016-10-10 10:49 37 ******************
15 2016-10-10 10:50 36 *****************
... ..(125 skipped). .. *****************
141 2016-10-10 12:56 36 *****************
142 2016-10-10 12:57 37 ******************
... ..( 60 skipped). .. ******************
203 2016-10-10 13:58 37 ******************
204 2016-10-10 13:59 36 *****************
... ..( 12 skipped). .. *****************
217 2016-10-10 14:12 36 *****************
218 2016-10-10 14:13 37 ******************
... ..( 46 skipped). .. ******************
265 2016-10-10 15:00 37 ******************
266 2016-10-10 15:01 ? -
267 2016-10-10 15:02 23 ****
268 2016-10-10 15:03 24 *****
269 2016-10-10 15:04 25 ******
270 2016-10-10 15:05 26 *******
271 2016-10-10 15:06 26 *******
272 2016-10-10 15:07 27 ********
... ..( 2 skipped). .. ********
275 2016-10-10 15:10 27 ********
276 2016-10-10 15:11 33 **************
... ..( 2 skipped). .. **************
279 2016-10-10 15:14 33 **************
280 2016-10-10 15:15 34 ***************
... ..( 5 skipped). .. ***************
286 2016-10-10 15:21 34 ***************
287 2016-10-10 15:22 35 ****************
... ..( 15 skipped). .. ****************
303 2016-10-10 15:38 35 ****************

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 2 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 602 Vendor specific

root@sysresccd /root %
[\code]

I watched a video about Ubuntu and using "disk utility" to show a dashboard that gave information about smart drives, displayed their current condition (working or not) and had a button that claimed to try and repair the drive. I didn't find anything like this on mint. I installed something called gnome-disk-utility, but the interface was very different and didn't have the smart drive repair feature that interested me.

I know testdrive exists, but I'm not confident how to apply to the case where I'm trying to make the unbootable partition bootable. I could probably figure out how to try and take individual files off the partition copies, but that would not be necessary if I can get the original drive to boot.

Does anyone know of a "cookbook" style guide that I could use to try and restore the unbootable drive?

If not, can anyone provide some guidance?

Also, once I ddrescued image and log files to a secondary drive, I just copied an pasted those backups to a tertiary drive. Does copy and paste work fine in this case?

TIA...

c0wb0y · 10-13-2016, 02:53 PM

How important the data in it? It may warrant a professional data recovery.

JeremyBoden · 10-13-2016, 02:57 PM

This is not an unbootable drive

You are able to boot and drop into a BusyBox screen.
This indicates that the disk is spinning up and reading the boot sectors.

SMART is simply a statistics gathering tool.
sudo smartctl -x /dev/sda on my PC gives very similar results to yours - except the disk is working.

I would suspect you [B]might[B] have a disk error and you should avoid using it in a write situation.
Is it possible to attempt reading from the disk on a second, working PC?

This might allow you to attempt to recover data from the disk.

GNewbie · 10-13-2016, 03:00 PM

Quote:

Originally Posted by c0wb0y

How important the data in it? It may warrant a professional data recovery.

Hi Cowboy, it isn't important enough for professional data recovery, but it would be nice to have the system become usable so I can access some of the files that weren't backed up before I retire and replace the old drive.

Learning how to do this is a benefit as well.

GNewbie · 10-13-2016, 03:07 PM

Quote:

Originally Posted by JeremyBoden

This is not an unbootable drive

You are able to boot and drop into a BusyBox screen.
This indicates that the disk is spinning up and reading the boot sectors.

SMART is simply a statistics gathering tool.
sudo smartctl -x /dev/sda on my PC gives very similar results to yours - except the disk is working.

I would suspect you [B]might[B] have a disk error and you should avoid using it in a write situation.
Is it possible to attempt reading from the disk on a second, working PC?

This might allow you to attempt to recover data from the disk.

Hi Jeremy, do you think it will mount if I take it out and put it in a USB drive fixture and plug it into another computer?

Is there any way I can check its "mountability" from a livecd?

Thanks for clarifying the problem as not being a boot problem. I will refer to the problem as a disk error going forward.

Does testdata check for these kind of disk errors and repair them? If so, can you refer me to an online tutorial that applies in my case?

TIA...

JeremyBoden · 10-13-2016, 06:03 PM

The best way would be to put it in a USB external drive and mount the partitions read only.

Its worth booting from a live cd and trying the same, because its less trouble.
If you can mount your data files, you should take some backups.

You may be able to localise the problem to a particular partition.
Just try mounting each partition.

You could test the readability of each sector by careful use of the dd command.
It is easy to wipe a disk by any errors in the parameters of dd, so great care is needed.

GNewbie · 10-14-2016, 12:38 AM

Hi Jeremy, messing around with dd to do any precision work on the drive itself is above my pay grade.

I tried to mount the drive from a livecd as follows...

Code:

mint@mint ~ $ sudo mkdir /mnt/rohome

mint@mint ~ $ mount -o ro /dev/sdc7 /mnt/rohome
mount: only root can use "--options" option
mint@mint ~ $ sudo mount -o ro /dev/sdc7 /mnt/rohome
mount: wrong fs type, bad option, bad superblock on /dev/sdc7,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

mint@mint ~ $ dmesg | tail
[11485.491584] ata3.00: status: { DRDY ERR }
[11485.491587] ata3.00: error: { UNC }
[11485.493404] ata3.00: configured for UDMA/133
[11485.493426] sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[11485.493433] sd 2:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current] [descriptor] 
[11485.493439] sd 2:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
[11485.493445] sd 2:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 04 99 38 70 00 00 08 00
[11485.493449] blk_update_request: I/O error, dev sdc, sector 77150320
[11485.493488] ata3: EH complete
[11485.493532] EXT4-fs (sdc7): can't read group descriptor 13
mint@mint ~ $

Given these errors does it still make sense to try and put it in a USB drive fixture and try again?

Is this link relevant to my case (error appears like it might be similar, but I'm a GNewbie and just guessing on that!):

Random freeze and "Unrecovered read error" in /var/log/messages
https://ubuntuforums.org/showthread.php?t=1720375

I had previously tries similar to (from example in link)...

Code:

sudo e2fsck -c /dev/sda1

And I was cycling through the...

Code:

Error reading block 4688827 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes

Force rewrite<y>? yes

...when I decided a ddrescue was in order after the third attempt, so a aborted and copied what I had. Should I reattempt it and keep cycling through as long as it asks me to do it?

Also, what does (again, code from linked example, would be updated to my specifics if I choose to try it)...

Code:

udo e2fsck -c -c -p -v /dev/sda1

...do and should I try it?

TIA...

GNewbie · 10-14-2016, 12:56 AM

I tried to find out the number of reallocated sectors on the disk and came up with the following:

Code:

mint@mint ~ $ smartctl -a /dev/sdc7 | grep -i reallocated
mint@mint ~ $

I interpret that as there being no reallocated sectors on the disk. Is this my disks problem?

Jeremy, I refrained from doing this...

Note code snippet taken from example in link below...

Code:

hdparm –read-sector 1261069669 /dev/sdb

...because you cautioned about writing to the disk. Is it OK to do it? I was following the procedure at this link...

Forcing a hard disk to reallocate bad sectors
http://www.sj-vs.net/forcing-a-hard-...e-bad-sectors/

I do find it odd that I got no result returned at all while the author of the link had two lines returned, but he didn't have any unallocated sectors in spite of the two returned lines.

It seems he writes over the bad sector (destroying all the data on the sector) and this somehow produced to reallocated sectors and his HDD worked again (at least in an array).

Is this something I need to try and do? I'm obviously being very cautious here.

TIA...

JeremyBoden · 10-14-2016, 06:53 AM

You don't have any reallocated sectors and even if you did it would be transparent to its operation.
Many reallocated sectors can be a warning that failure is imminent, though.

I would check that you can mount your other partitions.

Provided that you have backups of all wanted data,
Since mount doesn't even recognise the file system on /home its probably worth trying fsck in read-write mode.
If it finds more than a few errors, I would give up on it.

You could re-format /dev/sda7 - but you would have an untrustworthy disk.

GNewbie · 10-14-2016, 09:28 AM

Quote:

Originally Posted by JeremyBoden

You don't have any reallocated sectors and even if you did it would be transparent to its operation.
Many reallocated sectors can be a warning that failure is imminent, though.

I would check that you can mount your other partitions.

Provided that you have backups of all wanted data,
Since mount doesn't even recognise the file system on /home its probably worth trying fsck in read-write mode.
If it finds more than a few errors, I would give up on it.

You could re-format /dev/sda7 - but you would have an untrustworthy disk.

Hi Jeremy, Are you pretty much telling me that the situation is desperate?

Would I be better off just trying to use testdata (or something other program) to capture some files off the ddrescued copy of the /home folder at this point?

Also, since I've now progressed to working on the actual disk, I don't want to have to assume anything anything at all since I'm new to this.

It sounds to me like you are recommending what the person did here:

Random freeze and "Unrecovered read error" in /var/log/messages
https://ubuntuforums.org/showthread.php?t=1720375

How many errors would find "more than a few?"

TIA...

JeremyBoden · 10-14-2016, 10:33 AM

If you are going to reuse the disk, then perhaps max of 10 errors otherwise perhaps up to 100 errors?

You have to balance your time trying to recover the disk versus just buying a new one or returning it under any warranty.

This allows you to try quite "vigorous" methods.

For example (I've done this once years ago and it did work for me) http://lifehacker.com/5515337/save-a...-freezer-redux
Putting your drive in a freezer is a absolute last ditch end-of-the-road attempt.

GNewbie · 10-14-2016, 11:30 AM

Quote:

Originally Posted by JeremyBoden

If you are going to reuse the disk, then perhaps max of 10 errors otherwise perhaps up to 100 errors?

You have to balance your time trying to recover the disk versus just buying a new one or returning it under any warranty.

This allows you to try quite "vigorous" methods.

For example (I've done this once years ago and it did work for me) http://lifehacker.com/5515337/save-a...-freezer-redux
Putting your drive in a freezer is a absolute last ditch end-of-the-road attempt.

Hi Jeremy, I plan on shelving the disk one I get my data off of it, so keeping the disk isn't an issue.

Just to be clear, you think that doing the following is a way that **might** jumpstart the disk enough to get it to boot up into the GUI at least a couple more times (no guarantees and no hard feelings if it doesn't work... I know I'm knee deep in sludge here ;-)...

Code:

sudo e2fsck -c /dev/sda1

Followed by...

Code:

sudo e2fsck -c -c -p -v /dev/sda1

I'm using this link as a guide and the examples are from the linked example (I know I need to modify the details to fit my specific case)...

https://ubuntuforums.org/showthread.php?t=1720375

Is that right or am I confused? Again, I don't want to **ass**ume anything before I mess around with the drive.

TIA...

JeremyBoden · 10-14-2016, 12:19 PM

Yes - good luck!

GNewbie · 10-14-2016, 03:00 PM

Quote:

Originally Posted by JeremyBoden

Yes - good luck!

OK, I've been ruminating over my problem and was wondering if a plan B might be better.

What if I get a new HDD, install a newer version of Linux Mint, and then copy the file copy I created onto the new drive's /home directory and tried to boot and, if required, try to restore the partition on the new HDD?

Is that a viable and better better approach?

AwesomeMachine · 10-14-2016, 03:05 PM

I would run the manufacturer's diagnostic (usually available on the web), and see if it finds anything.