Hi,
Our system has two 1TB HDD's (/dev/sda, /dev/sdb) where a 150GB RAID-1 array (/dev/md3) is created to mirror the partitions "/dev/sda3" and "/dev/sdb3".
Recently we bought two 4TB HDD's for upgrade replacement.
The disks have been tested thoroughly with vendor's utility before deployment.
They support SCT ERC, but require setting "Read" and "Write" values to "70" after reboot.
On each new disk, a 500GB partition is created for extending the array.
The first step of replacement involves:
- "mdadm --fail /dev/md3 /dev/sdb3";
- "mdadm --remove /dev/md3 /dev/sdb3", then the array becomes "clean,degraded";
- power down the system and replace the 1TB /dev/sdb with one of the new 4TB disks;
- power up the system, the new disk is detected correctly as "/dev/sdb";
- "mdadm --add /dev/md3 /dev/sdb3" and the re-building process starts.
The rebuilding ran smoothly at the beginning, but died at about 98% of progress, with a bunch of I/O errors:
Code:
> ata2.00: exception Emask 0x0 SAct 0x1400 SErr 0x0 action 0x0
> ata2.00: irq_stat 0x40000008
> ata2.00: failed command: READ FPDMA QUEUED
> ata2.00: cmd 60/08:50:50:4b:7e/00:00:12:00:00/40 tag 10 ncq dma 4096 in
> res 41/40:00:50:4b:7e/00:00:12:00:00/40 Emask 0x409 (media error) <F>
> ata2.00: status: { DRDY ERR }
> ata2.00: error: { UNC }
> ata2.00: configured for UDMA/133
> sd 1:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
> sd 1:0:0:0: [sda] tag#10 Sense Key : Medium Error [current]
> sd 1:0:0:0: [sda] tag#10 Add. Sense: Unrecovered read error - auto reallocate failed
> sd 1:0:0:0: [sda] tag#10 CDB: Read(10) 28 00 12 7e 4b 50 00 00 08 00
> blk_update_request: I/O error, dev sda, sector 310266704 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> ata2: EH complete
> md/raid1:md3: sda: unrecoverable I/O read error for block 308167424
> md: md3: recovery interrupted.
> ata2.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x0
> ata2.00: irq_stat 0x40000008
> ata2.00: failed command: READ FPDMA QUEUED
> ata2.00: cmd 60/08:08:60:56:7e/00:00:12:00:00/40 tag 1 ncq dma 4096 in
> res 41/40:00:60:56:7e/00:00:12:00:00/40 Emask 0x409 (media error) <F>
> ata2.00: status: { DRDY ERR }
> ata2.00: error: { UNC }
> ata2.00: configured for UDMA/133
> sd 1:0:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
> sd 1:0:0:0: [sda] tag#1 Sense Key : Medium Error [current]
> sd 1:0:0:0: [sda] tag#1 Add. Sense: Unrecovered read error - auto reallocate failed
> sd 1:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 12 7e 56 60 00 00 08 00
> blk_update_request: I/O error, dev sda, sector 310269536 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> ata2: EH complete
> md/raid1:md3: sda: unrecoverable I/O read error for block 308170240
The steps have been repeated with different combinations of SATA cables and ports, therefore it is not likely related to hardware issues.
The "degraded" array still works perfectly in all other operations.
In the past 4 years of its service, no error has been logged at all.
We cannot think of a reason for the rebuild problem.
Should we copy all files to the new partition and re-create the array ?
Please kindly advise.