HDD errors, DriveStatusError, CentOS 5, DL360
Hey,
I got a server DL360 G5, about 1 year old. The server has 4xSAS 15k 72GB using raid 1+0 which gives me 1 logical drive at ~144GB. Running CentOS 5 Final, Kernel: 2.6.18-92.1.10.el5xen. Up until now everything went smooth, the server had +200days of uptime but few hours ago the server stuck. After the reboot odd messages started to pop up which never happen ever before, not even once: Code:
Aug 30 17:16:28 fun-files kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } The raid controller shows all disks are fine, all warning lights on the server are off, which makes it all weirder. I'm working with Linux servers a lot and never came across something like this. Searched about it and found it has something to do with a kernel bug, but it never happened before so I safely rule it out. I also doubt its only warning as it never happened before with the same kernel version and everything. When I type for example: fdisk -l It totally floods the log until I break it with CTRL+C. LOGS: Regards, Oleg G. |
Your drives are not OK and/or the controller is not OK.
You really need to formulate a check up routine... by changing bios settings to start with |
You are experiencing a drive or a controller failure. Backup immediately and commence diagnosis.
|
The system is backed up on an automatic service, I'm not worried about that.
Any bright idea how can I flash out the invalid drive out of a 4 drives array? I've also other servers usually when a drive is dead it's just turns red, I replace it and that's it. Thanks! |
If you have a hardware raid controller it should provide the tools to do that. I'm betting on the controller itself being bad, though, based on the symptoms presented.
|
The server is still covered by warranty, I just need to find the invalid part and minimized downtime on a production server.
I managed to get few spare SAS drives, I'll try swaping all the 4 current HDDs one by one allowing the controller to rebuild each one of them and see if the problem goes away. I hope you're wrong and it's not a controller, controller would require downtime and warrenty calling and I cant allow this at that point, hrmf. HP's tool seems to only run under windows, cant find any copy of that for linux for more info. |
All times are GMT -5. The time now is 02:33 PM. |