LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   mdadm software raid1 failed disk detection too long (http://www.linuxquestions.org/questions/linux-server-73/mdadm-software-raid1-failed-disk-detection-too-long-893112/)

ian1 07-22-2011 04:21 AM

mdadm software raid1 failed disk detection too long
 
Hi,

I have SLES10-SP3 running on an Intel SR1600URHS board with 3 hot-swap SATA disks configured using mdadm as Raid1 with hot spare. If I pull one of the active disks, all file i/o will stop for about 2.5 minutes after which it will start again and the raid array will be rebuilt using the spare disk. Is there any way I can reduce this 2.5 minutes of inactivity?

I've tried setting /sys/block/sdX/device/timeout and /sys/block/sdX/device/retries to 1 for all disks, but this hasn't made any difference. The output from messages is:

12:11:56: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
12:11:56: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x1e data 0
12:11:56: res 40/00:03:00:00:20/00:00:00:00:00/b0 Emask 0x4 (timeout)
12:12:03: ata2: port is slow to respond, please be patient (Status 0xd0)
12:12:26: ata2: port failed to respond (30 secs, Status 0xd0)
12:12:26: ata2: soft resetting port
12:12:59: ata2.00: qc timeout (cmd 0xec)
12:12:59: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
12:12:59: ata2.00: revalidation failed (errno=-5)
12:12:59: ata2: failed to recover some devices, retrying in 5 secs
12:14:26: ata2: soft resetting port
12:14:27: ata2.00: qc timeout (cmd 0xec)
12:14:27: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
12:14:27: ata2.00: revalidation failed (errno=-5)
12:14:27: ata2: failed to recover some devices, retrying in 5 secs
12:14:27: ata2: soft resetting port
12:14:27: ata2.00: qc timeout (cmd 0xec)
12:14:27: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
12:14:27: ata2.00: revalidation failed (errno=-5)
12:14:27: ata2.00: disabled
12:14:27: ata2: failed to recover some devices, retrying in 5 secs
12:14:27: ata2.01: failed to IDENTIFY (I/O error, err_mask=0x40)
12:14:27: ata2.01: revalidation failed (errno=-5)
12:14:27: ata2: failed to recover some devices, retrying in 5 secs
12:14:27: ata2: soft resetting port
12:14:27: ata2.01: configured for UDMA/100
12:14:27: ata2: EH complete
12:14:27: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:27: end_request: I/O error, dev sdc, sector 4321150
12:14:27: raid1: Disk failure on sdc2, disabling device.
12:14:27: Operation continuing on 1 devices
12:14:27: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:27: end_request: I/O error, dev sdc, sector 213937414
12:14:27: Buffer I/O error on device sdc2, logical block 209728384
12:14:27: Buffer I/O error on device sdc2, logical block 209728385
12:14:27: Buffer I/O error on device sdc2, logical block 209728386
12:14:27: Buffer I/O error on device sdc2, logical block 209728387
12:14:27: Buffer I/O error on device sdc2, logical block 209728388
12:14:27: Buffer I/O error on device sdc2, logical block 209728389
12:14:27: Buffer I/O error on device sdc2, logical block 209728390
12:14:27: Buffer I/O error on device sdc2, logical block 209728391
12:14:27: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:27: end_request: I/O error, dev sdc, sector 213937414
12:14:27: Buffer I/O error on device sdc2, logical block 209728384
12:14:27: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:27: end_request: I/O error, dev sdc, sector 213937415
12:14:27: Buffer I/O error on device sdc2, logical block 209728385
12:14:27: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:27: end_request: I/O error, dev sdc, sector 5274110
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 5512710
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 5618574
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 6830614
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 29903550
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 29903574
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 29903694
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 29965262
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 49823870
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 50347422
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 53230734
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 54278982
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 54540942
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 55327126
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 55327510
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 55327854
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 56376126
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 56637998
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 191381046
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 191642006
12:14:28: sd 1:0:0:0: SCSI error: return code = 0x00040000
12:14:28: end_request: I/O error, dev sdc, sector 192953614
12:14:28: RAID1 conf printout:
12:14:28: --- wd:1 rd:2
12:14:28: disk 0, wo:1, o:0, dev:sdc2
12:14:28: disk 1, wo:0, o:1, dev:sdb2
12:14:28: RAID1 conf printout:
12:14:28: --- wd:1 rd:2
12:14:28: disk 1, wo:0, o:1, dev:sdb2
12:14:28: RAID1 conf printout:
12:14:28: --- wd:1 rd:2
12:14:28: disk 0, wo:1, o:1, dev:sda2
12:14:28: disk 1, wo:0, o:1, dev:sdb2
12:14:28: md: syncing RAID array md1
12:14:28: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
12:14:28: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
12:14:28: md: using 128k window, over a total of 104864192 blocks.

Thanks,
Ian

ian1 07-27-2011 09:32 AM

Solved the issue by changing the BIOS SATA setting to AHCI and using the ahci driver. The timeout drops from 2.5 minutes to 15s.
Ian


All times are GMT -5. The time now is 10:45 AM.