raid + nasty clicking drive (Fedora Core 4)
So my AMD64 with a DFI Lan Party board has been humming along for about a year now. I've got 2 drives setup as raid.
The kernel I'm running is: 2.6.14-1.1653_FC4smp I just started hearing this NASTY clicking sound, pretty loud. I looked in /var/log/messages and there are TONS of errors. What can I do to fix this? Is it possible just to disable raid for now, get a new drive and let it re-sync? Of course I don't want to bring this server down if possible. Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 21:50:33 fencechat kernel: sdb: Current: sense key: Aborted Command Jul 31 21:50:33 fencechat kernel: Additional sense: No additional sense information Jul 31 21:50:33 fencechat kernel: end_request: I/O error, dev sdb, sector 164858863 Jul 31 21:50:33 fencechat kernel: raid1: Disk failure on sdb10, disabling device. Jul 31 21:50:33 fencechat kernel: Operation continuing on 1 devices Jul 31 21:50:33 fencechat kernel: RAID1 conf printout: Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda2 Jul 31 21:50:33 fencechat kernel: disk 1, wo:1, o:0, dev:sdb10 Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 21:50:33 fencechat kernel: sdb: Current: sense key: Aborted Command Jul 31 21:50:33 fencechat kernel: Additional sense: No additional sense information Jul 31 21:50:33 fencechat kernel: end_request: I/O error, dev sdb, sector 29703931 Jul 31 21:50:33 fencechat kernel: raid1: Disk failure on sdb6, disabling device. Jul 31 21:50:33 fencechat kernel: Operation continuing on 1 devices Jul 31 21:50:33 fencechat kernel: RAID1 conf printout: Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda6 Jul 31 21:50:33 fencechat kernel: disk 1, wo:1, o:0, dev:sdb6 Jul 31 21:50:33 fencechat kernel: RAID1 conf printout: Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda2 Jul 31 21:50:33 fencechat kernel: RAID1 conf printout: Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda6 Jul 31 21:50:40 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:40 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:40 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 21:50:40 fencechat kernel: sdb: Current: sense key: Aborted Command Jul 31 21:50:40 fencechat kernel: Additional sense: No additional sense information Jul 31 21:50:40 fencechat kernel: end_request: I/O error, dev sdb, sector 62460466 Jul 31 21:50:40 fencechat kernel: raid1: Disk failure on sdb9, disabling device. Jul 31 21:50:40 fencechat kernel: Operation continuing on 1 devices Jul 31 21:50:40 fencechat kernel: RAID1 conf printout: Jul 31 21:50:40 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:40 fencechat kernel: disk 0, wo:0, o:1, dev:sda3 Jul 31 21:50:40 fencechat kernel: disk 1, wo:1, o:0, dev:sdb9 Jul 31 21:50:40 fencechat kernel: RAID1 conf printout: Jul 31 21:50:40 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:40 fencechat kernel: disk 0, wo:0, o:1, dev:sda3 Jul 31 21:50:50 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 21:50:50 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 21:50:50 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 21:50:50 fencechat kernel: sdb: Current: sense key: Aborted Command Jul 31 21:50:50 fencechat kernel: Additional sense: No additional sense information Jul 31 21:50:50 fencechat kernel: end_request: I/O error, dev sdb, sector 5140609 Jul 31 21:50:50 fencechat kernel: raid1: Disk failure on sdb2, disabling device. Jul 31 21:50:50 fencechat kernel: Operation continuing on 1 devices Jul 31 21:50:50 fencechat kernel: RAID1 conf printout: Jul 31 21:50:50 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:50 fencechat kernel: disk 0, wo:0, o:1, dev:sda9 Jul 31 21:50:50 fencechat kernel: disk 1, wo:1, o:0, dev:sdb2 Jul 31 21:50:50 fencechat kernel: RAID1 conf printout: Jul 31 21:50:50 fencechat kernel: --- wd:1 rd:2 Jul 31 21:50:50 fencechat kernel: disk 0, wo:0, o:1, dev:sda9 ul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 22:16:56 fencechat kernel: sdb: Current: sense key: Aborted Command Jul 31 22:16:56 fencechat kernel: Additional sense: No additional sense information Jul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 16 Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 2 Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 22:16:56 fencechat kernel: sdb: Current: sense key: Aborted Command Jul 31 22:16:56 fencechat kernel: Additional sense: No additional sense information Jul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 24 Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 3 ul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 24 Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 3 Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002 Jul 31 22:16:56 fencechat kernel: sdb: Current: sense key: Aborted Command Jul 31 22:16:56 fencechat kernel: Additional sense: No additional sense information Jul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 32 Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 4 Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError } Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error } Jul 31 22:16:57 fencechat kernel: ata4: error=0x04 { DriveStatusError } |
I've tried using dmraid to disable raid using this command which failed:
dmraid -an ERROR: hpt45x: reading /dev/sdb[Input/output error] ERROR: isw: reading /dev/sdb[Input/output error] ERROR: lsi: reading /dev/sdb[Input/output error] ERROR: nvidia: reading /dev/sdb[Input/output error] ERROR: pdc: reading /dev/sdb[Input/output error] ERROR: pdc: reading /dev/sdb[Input/output error] ERROR: pdc: reading /dev/sdb[Input/output error] ERROR: pdc: reading /dev/sdb[Input/output error] ERROR: pdc: reading /dev/sdb[Input/output error] ERROR: sil: reading /dev/sdb[Input/output error] ERROR: via: reading /dev/sdb[Input/output error] RAID set "nvidia_bcfgcgjh" is not active I've also tried using mdadm for each device: mdadm --set-faulty /dev/md1 /dev/sdb2 cat /proc/mdstat shows this: Personalities : [raid1] md1 : active raid1 sdb2[2](F) sda9[0] 2048192 blocks [2/1] [U_] md8 : active raid1 sdb3[2](F) sda10[0] 2048192 blocks [2/1] [U_] md2 : active raid1 sdb5[2](F) sda5[0] 5116544 blocks [2/1] [U_] md4 : active raid1 sdb6[2](F) sda6[0] 5116544 blocks [2/1] [U_] md5 : active raid1 sdb7[2](F) sda7[0] 5116544 blocks [2/1] [U_] md6 : active raid1 sdb8[2](F) sda8[0] 5116544 blocks [2/1] [U_] md3 : active raid1 sdb9[2](F) sda3[0] 6144704 blocks [2/1] [U_] md7 : active raid1 sdb10[2](F) sda2[0] 51199040 blocks [2/1] [U_] md9 : active raid1 sdb11[2](F) sda11[0] 2048192 blocks [2/1] [U_] md0 : active raid1 sdb1[2](F) sda1[0] 521984 blocks [2/1] [U_] unused devices: <none> Also trying to stop using mdadm yields this: mdadm -S /dev/md1 mdadm: fail to stop array /dev/md1: Device or resource busy Would unplugging this device (on the fly) be a bad thing? |
All times are GMT -5. The time now is 04:29 PM. |