LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   raid + nasty clicking drive (Fedora Core 4) (https://www.linuxquestions.org/questions/linux-hardware-18/raid-nasty-clicking-drive-fedora-core-4-a-469571/)

conquest 07-31-2006 10:55 PM

raid + nasty clicking drive (Fedora Core 4)
 
So my AMD64 with a DFI Lan Party board has been humming along for about a year now. I've got 2 drives setup as raid.

The kernel I'm running is: 2.6.14-1.1653_FC4smp

I just started hearing this NASTY clicking sound, pretty loud. I looked in /var/log/messages and there are TONS of errors. What can I do to fix this? Is it possible just to disable raid for now, get a new drive and let it re-sync? Of course I don't want to bring this server down if possible.


Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002

Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 21:50:33 fencechat kernel: sdb: Current: sense key: Aborted Command
Jul 31 21:50:33 fencechat kernel: Additional sense: No additional sense information
Jul 31 21:50:33 fencechat kernel: end_request: I/O error, dev sdb, sector 164858863
Jul 31 21:50:33 fencechat kernel: raid1: Disk failure on sdb10, disabling device.
Jul 31 21:50:33 fencechat kernel: Operation continuing on 1 devices
Jul 31 21:50:33 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda2
Jul 31 21:50:33 fencechat kernel: disk 1, wo:1, o:0, dev:sdb10
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:33 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 21:50:33 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 21:50:33 fencechat kernel: sdb: Current: sense key: Aborted Command
Jul 31 21:50:33 fencechat kernel: Additional sense: No additional sense information
Jul 31 21:50:33 fencechat kernel: end_request: I/O error, dev sdb, sector 29703931
Jul 31 21:50:33 fencechat kernel: raid1: Disk failure on sdb6, disabling device.
Jul 31 21:50:33 fencechat kernel: Operation continuing on 1 devices
Jul 31 21:50:33 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda6
Jul 31 21:50:33 fencechat kernel: disk 1, wo:1, o:0, dev:sdb6
Jul 31 21:50:33 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda2
Jul 31 21:50:33 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:33 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:33 fencechat kernel: disk 0, wo:0, o:1, dev:sda6
Jul 31 21:50:40 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:40 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:40 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 21:50:40 fencechat kernel: sdb: Current: sense key: Aborted Command
Jul 31 21:50:40 fencechat kernel: Additional sense: No additional sense information
Jul 31 21:50:40 fencechat kernel: end_request: I/O error, dev sdb, sector 62460466
Jul 31 21:50:40 fencechat kernel: raid1: Disk failure on sdb9, disabling device.
Jul 31 21:50:40 fencechat kernel: Operation continuing on 1 devices
Jul 31 21:50:40 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:40 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:40 fencechat kernel: disk 0, wo:0, o:1, dev:sda3
Jul 31 21:50:40 fencechat kernel: disk 1, wo:1, o:0, dev:sdb9
Jul 31 21:50:40 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:40 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:40 fencechat kernel: disk 0, wo:0, o:1, dev:sda3
Jul 31 21:50:50 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 21:50:50 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 21:50:50 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 21:50:50 fencechat kernel: sdb: Current: sense key: Aborted Command
Jul 31 21:50:50 fencechat kernel: Additional sense: No additional sense information
Jul 31 21:50:50 fencechat kernel: end_request: I/O error, dev sdb, sector 5140609
Jul 31 21:50:50 fencechat kernel: raid1: Disk failure on sdb2, disabling device.
Jul 31 21:50:50 fencechat kernel: Operation continuing on 1 devices
Jul 31 21:50:50 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:50 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:50 fencechat kernel: disk 0, wo:0, o:1, dev:sda9
Jul 31 21:50:50 fencechat kernel: disk 1, wo:1, o:0, dev:sdb2
Jul 31 21:50:50 fencechat kernel: RAID1 conf printout:
Jul 31 21:50:50 fencechat kernel: --- wd:1 rd:2
Jul 31 21:50:50 fencechat kernel: disk 0, wo:0, o:1, dev:sda9
ul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 22:16:56 fencechat kernel: sdb: Current: sense key: Aborted Command
Jul 31 22:16:56 fencechat kernel: Additional sense: No additional sense information
Jul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 16
Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 2
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 22:16:56 fencechat kernel: sdb: Current: sense key: Aborted Command
Jul 31 22:16:56 fencechat kernel: Additional sense: No additional sense information
Jul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 24
Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 3
ul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 24
Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 3
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jul 31 22:16:56 fencechat kernel: sdb: Current: sense key: Aborted Command
Jul 31 22:16:56 fencechat kernel: Additional sense: No additional sense information
Jul 31 22:16:56 fencechat kernel: end_request: I/O error, dev sdb, sector 32
Jul 31 22:16:56 fencechat kernel: Buffer I/O error on device sdb, logical block 4
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:56 fencechat kernel: ata4: error=0x04 { DriveStatusError }
Jul 31 22:16:56 fencechat kernel: ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jul 31 22:16:57 fencechat kernel: ata4: error=0x04 { DriveStatusError }

conquest 07-31-2006 10:56 PM

I've tried using dmraid to disable raid using this command which failed:

dmraid -an
ERROR: hpt45x: reading /dev/sdb[Input/output error]
ERROR: isw: reading /dev/sdb[Input/output error]
ERROR: lsi: reading /dev/sdb[Input/output error]
ERROR: nvidia: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: sil: reading /dev/sdb[Input/output error]
ERROR: via: reading /dev/sdb[Input/output error]
RAID set "nvidia_bcfgcgjh" is not active


I've also tried using mdadm for each device:

mdadm --set-faulty /dev/md1 /dev/sdb2


cat /proc/mdstat shows this:

Personalities : [raid1]
md1 : active raid1 sdb2[2](F) sda9[0]
2048192 blocks [2/1] [U_]

md8 : active raid1 sdb3[2](F) sda10[0]
2048192 blocks [2/1] [U_]

md2 : active raid1 sdb5[2](F) sda5[0]
5116544 blocks [2/1] [U_]

md4 : active raid1 sdb6[2](F) sda6[0]
5116544 blocks [2/1] [U_]

md5 : active raid1 sdb7[2](F) sda7[0]
5116544 blocks [2/1] [U_]

md6 : active raid1 sdb8[2](F) sda8[0]
5116544 blocks [2/1] [U_]

md3 : active raid1 sdb9[2](F) sda3[0]
6144704 blocks [2/1] [U_]

md7 : active raid1 sdb10[2](F) sda2[0]
51199040 blocks [2/1] [U_]

md9 : active raid1 sdb11[2](F) sda11[0]
2048192 blocks [2/1] [U_]

md0 : active raid1 sdb1[2](F) sda1[0]
521984 blocks [2/1] [U_]

unused devices: <none>

Also trying to stop using mdadm yields this:

mdadm -S /dev/md1
mdadm: fail to stop array /dev/md1: Device or resource busy


Would unplugging this device (on the fly) be a bad thing?


All times are GMT -5. The time now is 04:29 PM.