Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
08-24-2019, 08:45 PM
|
#1
|
Member
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64
Rep:
|
smartd: "Currently unreadable (pending) sectors" errors
I have started to get these messages:
Code:
Aug 23 12:56:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
Aug 23 13:26:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
Aug 23 13:56:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
Aug 23 14:26:59 mypc smartd[6790]: Device: /dev/sdb [SAT], 4 Currently unreadable (pending) sectors
I ran a full scan
Code:
smartctl -t long /dev/sdb
and got this result:
Code:
# smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.66-gentoo] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue
Device Model: WDC WD20EZRZ-00Z5HB0
Serial Number: WD-WCC4M7US6Y4L
LU WWN Device Id: 5 0014ee 264432c61
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Aug 25 09:05:10 2019 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (26460) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 267) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 94
3 Spin_Up_Time 0x0027 174 173 021 Pre-fail Always - 4300
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 36
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 16968
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 8
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 969829
194 Temperature_Celsius 0x0022 124 107 000 Old_age Always - 23
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 4
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 3
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 4
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 16954 66630408
# 2 Short offline Completed: read failure 90% 16954 66630409
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
My /etc/smartd.conf:
Code:
DEVICESCAN
/dev/sda -S on -o on -a -I 194 -m robert@mydomain.com.au
/dev/sdb -S on -o on -a -I 194 -m robert@mydomain.com.au
I've tried replacing the SATA cable, with the same result.
Should I trash this HD, or is there a safe workaround? I see that there is an article here which suggests a workaround, but I'd like to get some advice first. The drive doesn't contain any system files - it's full of documents which get backed up every day.
I realise that this might be a common question on these forums, but I'm not particularly experienced with these things.
|
|
|
08-24-2019, 10:04 PM
|
#2
|
Senior Member
Registered: Jan 2012
Distribution: Slackware
Posts: 3,348
Rep:
|
If you look at SMART attribute 197 (CURRENT_PENDING_SECTOR), it has a value of 4. This means that at some point, the drive was unable to successfully read the data from 4 different sectors, and hence have flagged these sectors for possible reallocation.
4 sectors aren't a lot, and since your drive is an "advanced format" drive with 4k sectors and 512-byte emulation, it's entirely possible that you've only really got one single 4k sector with issues. That's not necessarily an indication that the drive is about to go bad.
If you scan through the drive manually with dd or badblocks or somesuch (note: a "long" SMART test is not guaranteed to find every error), you will find a total of at least 8 affected, adjecent 512-byte sectors that are all really the contents of the same 4k sector. If you find a lot more, consider replacing the drive.
To fix the bad sector, I'd recommend you do the following: - Locate the bad sectors in question.
- See if any files are currently using these sectors
- Make sure you have copies of any affected files
- Overwrite the affected sectors with zeroes using hdparm or dd
- Restore good copies of the affected files identified in step 2
Here's a reasonably good wiki article detailing the entire process.
(I could post the exact procedure if you post your partition layout and which filesystem(s) you're using.)
|
|
1 members found this post helpful.
|
08-25-2019, 05:22 AM
|
#3
|
Member
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64
Original Poster
Rep:
|
Many thanks for your help. I'm using ext4 on an LVM2 partition. So far I've done this:
Code:
# badblocks -v -b 4096 /dev/lvm/partition
Checking blocks 0 to 131071999
Checking for bad blocks (read-only test): 8328330
8328375
8328801
<test aborted here - will run the rest overnight>
Then:
Code:
# debugfs
debugfs 1.45.2 (27-May-2019)
debugfs: open /dev/lvm/partition
debugfs: testb 8328375
Block 8328375 marked in use
debugfs: icheck 8328375
Block Inode number
8328375 8136080
debugfs: ncheck 8136080
Inode Pathname
8136080 /DIR/FILE1.EXT
debugfs: icheck 8328801
Block Inode number
8328801 14156556
debugfs: testb 8328801
Block 8328801 marked in use
debugfs: ncheck 14156556
Inode Pathname
14156556 /DIR/FILE2.EXT
debugfs: quit
When I try to copy the file I get "Input/output error", so I've deleted the files (not very important). The doc says I should run dd/hdparm next, but I'd like some advice on what parameter I should use for a "destructive" command.
Last edited by Robert S; 08-25-2019 at 03:55 PM.
|
|
|
08-26-2019, 12:42 PM
|
#4
|
Senior Member
Registered: Jan 2012
Distribution: Slackware
Posts: 3,348
Rep:
|
The best way to overwrite a troublesome sector, is to use hdparm --write-sector. This will either fix the issue by making the sector available again, or by forcing the drive to perform a reallocation (substituting one of the reserved, spare sectors for the defective one). This will then be immediately visible in the SMART data: Attribute 197 (Current_Pending_Sector) will show 0, and attribute 5 (Reallocated_Sector_Ct) will be the number of sectors that couldn't be fixed by a re-write but had to be reallocated instead.
In order to use hdparm, one has to know the physical (LBA) number of the sector in question. One sure-fire way to find this number is to attempt to read the affected file, and see which sector number is reported by the controller driver in the kernel log.
Since you've deleted the file, this option is no longer available. You should still be able to find the previous error in the log; the entry should look something like this: <date> <time> <hostname> kernel: end_request: I/O error, dev <device>, sector <number> <number> will be the sector number you're looking for.
Running badblocks again will of course also cause the error message to reappear in the log.
|
|
|
08-26-2019, 01:47 PM
|
#5
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803
|
Instructions for finding what files are affected and reallocating the affected sectors can be found in the Bad Block HOWTO. The procedure depends on what filesystem is involved. The document does not yet cover XFS.
|
|
|
08-27-2019, 05:03 AM
|
#6
|
Member
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64
Original Poster
Rep:
|
Thanks. I've followed this and have overwritten three sectors with zeros, within one LV. Initially an extended offline test reported no errors, but subsequent tests have reported errors again:
Code:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 10% 17026 66630408
# 2 Extended offline Completed: read failure 90% 17017 66630408
# 3 Extended offline Completed without error 00% 17005 -
# 4 Short offline Completed: read failure 90% 16993 66630409
# 5 Short offline Completed: read failure 90% 16978 66630410
# 6 Extended offline Completed: read failure 90% 16969 66630736
# 7 Short offline Completed: read failure 90% 16969 66630413
# 8 Extended offline Completed: read failure 90% 16954 66630408
# 9 Short offline Completed: read failure 90% 16954 66630409
6 of 8 failed self-tests are outdated by newer successful extended offline self-test # 3
When I try to read the LBA of the first error with hdparm, it is able to read it:
Code:
# hdparm --read-sector 66630408 /dev/sdb
/dev/sdb:
reading sector 66630408: succeeded
e6dd 2dc4 39f2 ce32 18b1 3cf2 a0fb 7e35
<,etc.>
Furthermore, I'm still getting error messages in my syslog:
Code:
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], opened
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], WDC WD20EZRZ-00Z5HB0, S/N:WD-WCC4M7US6Y4L, WWN:5-0014ee-264432c61, FW:80.00A80, 2.00 TB
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], found in smartd database: Western Digital Blue
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], opened
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], ST1500LM003-9YH148, S/N:W110EGR9, WWN:5-000c50-049ef6fd8, FW:CC9F, 1.50 TB
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], not found in smartd database.
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Aug 27 18:41:55 mypc smartd[6685]: Monitoring 3 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], 3 Offline uncorrectable sectors
Aug 27 18:41:55 mypc smartd[6685]: Device: /dev/sdb [SAT], previous self-test completed with error (read test element)
I'm rather confused here - have I successfully reallocated the sector, and why have the selftest messages come back again?
|
|
|
08-27-2019, 08:33 AM
|
#7
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803
|
Try reading sectors that follow the reported LBA. The long test might be (and probably is) reading blocks larger than a single sector, and not refining the report to indicate the exact failing sector. Try "dd if=/dev/sdb skip=66630408 count=1024 of=/dev/null" and see where it fails.
|
|
|
08-27-2019, 08:25 PM
|
#8
|
Member
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64
Original Poster
Rep:
|
That seems to work fine:
Code:
# dd if=/dev/sdb skip=66630408 count=1024 of=/dev/null
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 7.0131 s, 74.8 kB/s
|
|
|
08-27-2019, 10:58 PM
|
#9
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803
|
It's hard to explain why the internal self test might be failling, then.
|
|
|
08-30-2019, 02:01 AM
|
#10
|
Senior Member
Registered: Jan 2012
Distribution: Slackware
Posts: 3,348
Rep:
|
Quote:
Originally Posted by Robert S
I'm rather confused here - have I successfully reallocated the sector, and why have the selftest messages come back again?
|
Remember, your drive has 4096-byte physical sectors that appears as 8 times as many 512-byte sectors via emulation. That means you can't actually have one bad 512-byte sector, because any bad sector will be 4096 bytes in size, and will appear as 8 consecutive bad 512-byte sectors.
In other words, you will have to overwrite all 8 emulated sectors for the error message to permanently disappear.
|
|
1 members found this post helpful.
|
08-31-2019, 12:50 AM
|
#11
|
Member
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64
Original Poster
Rep:
|
I assume, then, that I need to do:
Code:
dd of=/dev/sdb skip=66630408 bs=512 count=8 if=/dev/null
to overwrite the entire sector. How do I ensure that this won't overwrite any files (I'm using ext4 on LVM2)? I've already deleted corrupted files in the preceding steps.
|
|
|
08-31-2019, 10:30 AM
|
#12
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803
|
Quote:
Originally Posted by Robert S
I assume, then, that I need to do:
Code:
dd of=/dev/sdb skip=66630408 bs=512 count=8 if=/dev/null
to overwrite the entire sector. How do I ensure that this won't overwrite any files (I'm using ext4 on LVM2)? I've already deleted corrupted files in the preceding steps.
|
First, that command is completely wrong and won't actually do anything (/dev/null returns immediate EOF when read, and "skip=..." affects the input stream, not the output).
Second, LVM makes the mapping more complicated, so be sure you've used the "LVM repairs" section of the HOWTO to determine which (if any) file is using that block. It's just one 4KB block, and so can't be used by more than one file (though that "file" might actually be a directory). Once you've determined that the block can be safely overwritten, then
Code:
dd of=/dev/sdb seek=66630408 bs=512 count=8 if=/dev/zero
^^^^ ^^^^
will overwrite the block, but there's no assurance the block will be reallocated since your previous step showed that the whole region of the disk can be read successfully.
Last edited by rknichols; 08-31-2019 at 10:33 AM.
|
|
1 members found this post helpful.
|
09-01-2019, 01:20 AM
|
#13
|
Member
Registered: Oct 2006
Location: Canberra
Distribution: gentoo, debian
Posts: 64
Original Poster
Rep:
|
Many thanks:
Code:
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 17141 -
# 2 Short offline Completed without error 00% 17136 -
|
|
|
05-19-2021, 08:21 AM
|
#14
|
LQ Newbie
Registered: May 2021
Location: Milan
Distribution: Debian OpenMediaVault
Posts: 2
Rep:
|
Hi, I've read all thread and try to apply to my 4TB data disk. My qyestion is: I have a RAID1 system on OpenMediaVault with only one disk (/dev/sdb) of RAID1 with 4 unreadable pending sectors. Is the procedure valid also for a RAID1 system? is the mirroring operation affected relocating sectors?
many thanks in advance
|
|
|
05-19-2021, 09:33 AM
|
#15
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,803
|
Quote:
Originally Posted by Carl0
Hi, I've read all thread and try to apply to my 4TB data disk. My qyestion is: I have a RAID1 system on OpenMediaVault with only one disk (/dev/sdb) of RAID1 with 4 unreadable pending sectors. Is the procedure valid also for a RAID1 system? is the mirroring operation affected relocating sectors?
many thanks in advance
|
Using RAID can certainly affect the way that block addresses are mapped to the physical disk. I'm afraid I don't have the knowledge or experience needed to assist with unwinding that.
|
|
|
All times are GMT -5. The time now is 10:06 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|