How can disk reporting badblocks>0 suddenly become badblocks==0?
Hello!
Few months ago I started getting I/O errors during full backup with "tar".
The errors were concentrated on specific files, I've run "badblocks"
(default, read-only mode with progress indication) on that disk (which is a pair
of SAS disks "TOSHIBA MK2001TRKB" bought in 2012, arranged as "hardware RAID 0"
array, controlled by Adaptec ICP5165BR controller and accessed as logical volume)
and got quite many bad blocks result.
That was a sign that the disk (at least one of the two) needs to be replaced
so I bought a pair of new SAS disks of same capacity, connected the new pair
to the same connector as the old pair and moved the old pair to another connector
of the RAID controller. After setting up the new array in an identical way as the old array
had been arranged the system returned to normal operation.
At that point I got curious: which of the two old disks (or both) has developed the bad blocks?
I deleted the old array, configured each of the two old disks as a separate volume, created
ext4 partition on it and formatted it. After that I've run the "badblocks" (again,
in default, read-only mode) on one of the fisks and got zero bad blocks. So I thought "ok,
all bad blocks are on the second disk", run "badblocks" on the second disk and was surprized
to get the same result of zero bad blocks on it too. That looked strange - I remember perfectly
that when I had the "I/O" errors during backup I've got nonzero (and quite high) number
of bad blocks. So just to check if connecting the disks as an array changes anything
I unmounted the two disks, set them again as "RAID 0" array, set them as logical volume
and run on it the "badblocks -sw /dev/sdc" - now I don't need the data on that disk so I can
use a destructive write mode. After 2-3 days of running I've for the same result: zero bad blocks.
Why is that? I thought that if magnetic surface deteriorates with time to the point of wrong data
being written/read that should stay. Other things that changed - the array is now connected to
another connector of the RAID controller. If there was a bad connection in the connector then
I would expect random I/O errors but they were on very specific files so that rules out
the bad contact at connector possibility.
So, what else might've been changed that suddenly caused the bad blocks to become "good"?
Before re-running "badblocks" on the old disks I thought to throw them away but now I'm not
sure - maybe I can still use them for few more years?
TIA,
kaza.
|