LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 08-24-2019, 08:45 PM   #1
Robert S
Member
 
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64

Rep: Reputation: 15
smartd: "Currently unreadable (pending) sectors" errors


I have started to get these messages:
Code:
Aug 23 12:56:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
Aug 23 13:26:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
Aug 23 13:56:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
Aug 23 14:26:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
I ran a full scan
Code:
smartctl -t long /dev/sdb
and got this result:
Code:
# smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.66-gentoo] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue
Device Model:     WDC WD20EZRZ-00Z5HB0
Serial Number:    WD-WCC4M7US6Y4L
LU WWN Device Id: 5 0014ee 264432c61
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Aug 25 09:05:10 2019 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (26460) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 267) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       94
  3 Spin_Up_Time            0x0027   174   173   021    Pre-fail  Always       -       4300
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       36
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       16968
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       969829
194 Temperature_Celsius     0x0022   124   107   000    Old_age   Always       -       23
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       3
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       4

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     16954         66630408
# 2  Short offline       Completed: read failure       90%     16954         66630409

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
My /etc/smartd.conf:
Code:
DEVICESCAN
/dev/sda -S on -o on -a -I 194 -m robert@mydomain.com.au
/dev/sdb -S on -o on -a -I 194 -m robert@mydomain.com.au
I've tried replacing the SATA cable, with the same result.

Should I trash this HD, or is there a safe workaround? I see that there is an article here which suggests a workaround, but I'd like to get some advice first. The drive doesn't contain any system files - it's full of documents which get backed up every day.

I realise that this might be a common question on these forums, but I'm not particularly experienced with these things.
 
Old 08-24-2019, 10:04 PM   #2
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,348

Rep: Reputation: Disabled
If you look at SMART attribute 197 (CURRENT_PENDING_SECTOR), it has a value of 4. This means that at some point, the drive was unable to successfully read the data from 4 different sectors, and hence have flagged these sectors for possible reallocation.

4 sectors aren't a lot, and since your drive is an "advanced format" drive with 4k sectors and 512-byte emulation, it's entirely possible that you've only really got one single 4k sector with issues. That's not necessarily an indication that the drive is about to go bad.

If you scan through the drive manually with dd or badblocks or somesuch (note: a "long" SMART test is not guaranteed to find every error), you will find a total of at least 8 affected, adjecent 512-byte sectors that are all really the contents of the same 4k sector. If you find a lot more, consider replacing the drive.

To fix the bad sector, I'd recommend you do the following:
  1. Locate the bad sectors in question.
  2. See if any files are currently using these sectors
  3. Make sure you have copies of any affected files
  4. Overwrite the affected sectors with zeroes using hdparm or dd
  5. Restore good copies of the affected files identified in step 2

Here's a reasonably good wiki article detailing the entire process.

(I could post the exact procedure if you post your partition layout and which filesystem(s) you're using.)
 
1 members found this post helpful.
Old 08-25-2019, 05:22 AM   #3
Robert S
Member
 
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64

Original Poster
Rep: Reputation: 15
Many thanks for your help. I'm using ext4 on an LVM2 partition. So far I've done this:
Code:
 # badblocks -v -b 4096 /dev/lvm/partition
Checking blocks 0 to 131071999
Checking for bad blocks (read-only test): 8328330
8328375
8328801
<test aborted here - will run the rest overnight>
Then:
Code:
# debugfs
debugfs 1.45.2 (27-May-2019)
debugfs:  open /dev/lvm/partition
debugfs:  testb 8328375
Block 8328375 marked in use
debugfs:  icheck 8328375
Block   Inode number
8328375 8136080
debugfs:  ncheck 8136080
Inode   Pathname
8136080 /DIR/FILE1.EXT
debugfs:  icheck 8328801
Block   Inode number
8328801 14156556
debugfs:  testb 8328801
Block 8328801 marked in use
debugfs:  ncheck 14156556
Inode   Pathname
14156556        /DIR/FILE2.EXT
debugfs:  quit
When I try to copy the file I get "Input/output error", so I've deleted the files (not very important). The doc says I should run dd/hdparm next, but I'd like some advice on what parameter I should use for a "destructive" command.

Last edited by Robert S; 08-25-2019 at 03:55 PM.
 
Old 08-26-2019, 12:42 PM   #4
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,348

Rep: Reputation: Disabled
The best way to overwrite a troublesome sector, is to use hdparm --write-sector. This will either fix the issue by making the sector available again, or by forcing the drive to perform a reallocation (substituting one of the reserved, spare sectors for the defective one). This will then be immediately visible in the SMART data: Attribute 197 (Current_Pending_Sector) will show 0, and attribute 5 (Reallocated_Sector_Ct) will be the number of sectors that couldn't be fixed by a re-write but had to be reallocated instead.

In order to use hdparm, one has to know the physical (LBA) number of the sector in question. One sure-fire way to find this number is to attempt to read the affected file, and see which sector number is reported by the controller driver in the kernel log.

Since you've deleted the file, this option is no longer available. You should still be able to find the previous error in the log; the entry should look something like this:
<date> <time> <hostname> kernel: end_request: I/O error, dev <device>, sector <number>
<number> will be the sector number you're looking for.

Running badblocks again will of course also cause the error message to reappear in the log.
 
Old 08-26-2019, 01:47 PM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803

Rep: Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224
Instructions for finding what files are affected and reallocating the affected sectors can be found in the Bad Block HOWTO. The procedure depends on what filesystem is involved. The document does not yet cover XFS.
 
Old 08-27-2019, 05:03 AM   #6
Robert S
Member
 
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64

Original Poster
Rep: Reputation: 15
Thanks. I've followed this and have overwritten three sectors with zeros, within one LV. Initially an extended offline test reported no errors, but subsequent tests have reported errors again:
Code:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       10%     17026         66630408
# 2  Extended offline    Completed: read failure       90%     17017         66630408
# 3  Extended offline    Completed without error       00%     17005         -
# 4  Short offline       Completed: read failure       90%     16993         66630409
# 5  Short offline       Completed: read failure       90%     16978         66630410
# 6  Extended offline    Completed: read failure       90%     16969         66630736
# 7  Short offline       Completed: read failure       90%     16969         66630413
# 8  Extended offline    Completed: read failure       90%     16954         66630408
# 9  Short offline       Completed: read failure       90%     16954         66630409
6 of 8 failed self-tests are outdated by newer successful extended offline self-test # 3
When I try to read the LBA of the first error with hdparm, it is able to read it:
Code:
# hdparm --read-sector 66630408 /dev/sdb

/dev/sdb:
reading sector 66630408: succeeded
e6dd 2dc4 39f2 ce32 18b1 3cf2 a0fb 7e35
<,etc.>
Furthermore, I'm still getting error messages in my syslog:
Code:
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], opened
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], WDC WD20EZRZ-00Z5HB0, S/N:WD-WCC4M7US6Y4L, WWN:5-0014ee-264432c61, FW:80.00A80, 2.00 TB
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], found in smartd database: Western Digital Blue
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], opened
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], ST1500LM003-9YH148, S/N:W110EGR9, WWN:5-000c50-049ef6fd8, FW:CC9F, 1.50 TB
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], not found in smartd database.
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Aug 27 18:41:55 mypc smartd[6685]: Monitoring 3 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], 3 Offline uncorrectable sectors
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], previous self-test completed with error (read test element)
I'm rather confused here - have I successfully reallocated the sector, and why have the selftest messages come back again?
 
Old 08-27-2019, 08:33 AM   #7
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803

Rep: Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224
Try reading sectors that follow the reported LBA. The long test might be (and probably is) reading blocks larger than a single sector, and not refining the report to indicate the exact failing sector. Try "dd if=/dev/sdb skip=66630408 count=1024 of=/dev/null" and see where it fails.
 
Old 08-27-2019, 08:25 PM   #8
Robert S
Member
 
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64

Original Poster
Rep: Reputation: 15
That seems to work fine:
Code:
 # dd if=/dev/sdb skip=66630408 count=1024 of=/dev/null
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 7.0131 s, 74.8 kB/s
 
Old 08-27-2019, 10:58 PM   #9
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803

Rep: Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224
It's hard to explain why the internal self test might be failling, then.
 
Old 08-30-2019, 02:01 AM   #10
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,348

Rep: Reputation: Disabled
Quote:
Originally Posted by Robert S View Post
I'm rather confused here - have I successfully reallocated the sector, and why have the selftest messages come back again?
Remember, your drive has 4096-byte physical sectors that appears as 8 times as many 512-byte sectors via emulation. That means you can't actually have one bad 512-byte sector, because any bad sector will be 4096 bytes in size, and will appear as 8 consecutive bad 512-byte sectors.

In other words, you will have to overwrite all 8 emulated sectors for the error message to permanently disappear.
 
1 members found this post helpful.
Old 08-31-2019, 12:50 AM   #11
Robert S
Member
 
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64

Original Poster
Rep: Reputation: 15
I assume, then, that I need to do:
Code:
dd of=/dev/sdb skip=66630408 bs=512 count=8 if=/dev/null
to overwrite the entire sector. How do I ensure that this won't overwrite any files (I'm using ext4 on LVM2)? I've already deleted corrupted files in the preceding steps.
 
Old 08-31-2019, 10:30 AM   #12
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803

Rep: Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224
Quote:
Originally Posted by Robert S View Post
I assume, then, that I need to do:
Code:
dd of=/dev/sdb skip=66630408 bs=512 count=8 if=/dev/null
to overwrite the entire sector. How do I ensure that this won't overwrite any files (I'm using ext4 on LVM2)? I've already deleted corrupted files in the preceding steps.
First, that command is completely wrong and won't actually do anything (/dev/null returns immediate EOF when read, and "skip=..." affects the input stream, not the output).

Second, LVM makes the mapping more complicated, so be sure you've used the "LVM repairs" section of the HOWTO to determine which (if any) file is using that block. It's just one 4KB block, and so can't be used by more than one file (though that "file" might actually be a directory). Once you've determined that the block can be safely overwritten, then
Code:
dd of=/dev/sdb seek=66630408 bs=512 count=8 if=/dev/zero
               ^^^^                                 ^^^^
will overwrite the block, but there's no assurance the block will be reallocated since your previous step showed that the whole region of the disk can be read successfully.

Last edited by rknichols; 08-31-2019 at 10:33 AM.
 
1 members found this post helpful.
Old 09-01-2019, 01:20 AM   #13
Robert S
Member
 
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64

Original Poster
Rep: Reputation: 15
Many thanks:
Code:
SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     17141         -
# 2  Short offline       Completed without error       00%     17136         -
 
Old 05-19-2021, 08:21 AM   #14
Carl0
LQ Newbie
 
Registered: May 2021
Location: Milan
Distribution: Debian OpenMediaVault
Posts: 2

Rep: Reputation: Disabled
Hi, I've read all thread and try to apply to my 4TB data disk. My qyestion is: I have a RAID1 system on OpenMediaVault with only one disk (/dev/sdb) of RAID1 with 4 unreadable pending sectors. Is the procedure valid also for a RAID1 system? is the mirroring operation affected relocating sectors?
many thanks in advance
 
Old 05-19-2021, 09:33 AM   #15
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803

Rep: Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224Reputation: 2224
Quote:
Originally Posted by Carl0 View Post
Hi, I've read all thread and try to apply to my 4TB data disk. My qyestion is: I have a RAID1 system on OpenMediaVault with only one disk (/dev/sdb) of RAID1 with 4 unreadable pending sectors. Is the procedure valid also for a RAID1 system? is the mirroring operation affected relocating sectors?
many thanks in advance
Using RAID can certainly affect the way that block addresses are mapped to the physical disk. I'm afraid I don't have the knowledge or experience needed to assist with unwinding that.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Currently unreadable (pending) sectors axiomtek123 Linux - Server 4 01-26-2018 09:41 PM
Could we have an rc.smartd script, to stop upgrades from disabling smartd please? xj25vm Slackware 6 03-23-2016 02:45 PM
degraded raid due to pending sectors? hortageno Linux - Software 1 08-23-2015 10:30 AM
Currently unreadable (pending) sectors ravindert Linux - Newbie 1 09-16-2013 09:51 PM
Smartd Error Message generated by Smartd Daemon Proces rexjenny Red Hat 1 11-29-2006 08:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 10:06 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration