Software raid failure every day at the exact same time
Hi,
I have a Debian server which has always been a bit unstable. I think it may be a design flaw in one of the hardware components, because another server with the exact same hardware config has the same problem. In the beginning it often had raid crashes that made the filesystem go into readonly, until at some point it was put in a data center and it ran fine for 1.5 year. I then had to add some memory after which it started crashing again once every 1-2 weeks. Very annoying but without alternatives nothing to do about it.
In the past week, the raid has crashed every single day though, and always at exactly the same time, 6:28 in the morning give or take half a minute. In most cases the same sector. It's a web server and there's not that many users at that time. No crons or anything scheduled at that time either. Logs show nothing that helps, as always. If there's a bad sector, I just don't understand why it always dies at that same time. And since that other server with the same hardware config has the same kind of crashes (not sure if at the same time) I still doubt it's really because of bad sectors. Resyncing after a failure always goes without any errors, so shouldn't that give an error as well when there's bad sectors?
Here's a part of the syslog:
Oct 10 06:28:21 kernel: scsi0: ERROR on channel 0, id 2, lun 0, CDB: Read (10) 00 01 a1 c5 b7 00 00 60 00
Oct 10 06:28:21 kernel: Info fld=0x1a1c5d7, Current sda: sense key Medium Error
Oct 10 06:28:21 kernel: Additional sense: Unrecovered read error
Oct 10 06:28:21 kernel: end_request: I/O error, dev sda, sector 27379127
Oct 10 06:28:21 kernel: raid1: Disk failure on sda2, disabling device.
Oct 10 06:28:21 kernel: ^IOperation continuing on 1 devices
Oct 10 06:28:21 kernel: raid1: sda2: rescheduling sector 27379064
Oct 10 06:28:21 kernel: raid1: sdb2: redirecting sector 27379064 to another mirror
The reschedule and redirect goes on forever. When I remove and add the failing disk the server dies and needs an apc reboot.
Any suggestions what could cause this? Thanks!
|