LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   hardware scsi error (https://www.linuxquestions.org/questions/linux-hardware-18/hardware-scsi-error-305321/)

inaki 03-23-2005 09:25 PM

hardware scsi error
 
How to interprate this message error taken from kern.log

Mar 21 09:16:12 kap kernel: Current sd08:03: sense key Hardware Error
Mar 21 09:16:12 kap kernel: Additional sense indicates Defect list error
Mar 21 09:16:12 kap kernel: I/O error: dev 08:03, sector 64243976
Mar 21 09:16:13 kap kernel: Info fld=0x3e4f77c, Current sd08:03: sense key Recovered Error
Mar 22 17:58:04 kap kernel: Info fld=0x24cab94 (nonstd), Current sd08:03: sense
key None
Mar 22 17:58:04 kap kernel: I/O error: dev 08:03, sector 37486672
Mar 22 17:58:59 kap kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return
code = 8000002
Mar 22 17:58:59 kap kernel: Info fld=0x24cab94 (nonstd), Current sd08:03: sense
key None


How to solve this error

rnturn 03-24-2005 01:57 PM

It looks as though the disk has developed (or developing) a bad block. From the messages, it appears to be somewhere in the sda3 partition (sd 08:03). That bit about the defect list error leads me to believe that the block in question might not have been found on the drives list of known defects, i.e., it's a newly developed bad block or one that's getting questionable. It does look like it eventually recovered from the first error. This is definitely something to keep an eye on. Caveat: I'm not getting this from reading the SCSI device drivers to see where the messages are coming from but rather just trying to interpret the messages.

If the problem persists, you might need to look into backing everything up and running "badblocks" on the disk after booting from a rescue CD. Space permitting, transferring everything to another disk could work as a backup (making a large tar archive from each filesystem on sdb, for example). Otherwise, tape is probably your best bet. To update the badblock table and remove the block that caused this error from use, you could either run "badblocks" followed by "mke2fs" on each partition to update the badblock table or, if the SCSI HBA allows it, you could do the media check on the entire drive via the HBA's firmware. (I'm assuming that this is even possible with your HBA.) Both of these will put your data at risk. I've done the firmware-based checks on Adaptec boards quite a lot and know that it's a destructive operation; the firmware makes darn sure you understand this as well. After you've found all the bad blocks and updated the badblock table on the disk, recreate the partitions (mandatory if the HBA does the badblock checking), and restore from your backup. Your boot loader might be toast following this (definitely so if the firmware check was done) so you might need to reinstall that as well while you're booted off your rescue CD. This is a messy process and one to avoid if this kind of error is not frequent. But if you're noticing it more and more and/or you're seeing files come up corrupted, the disk may be going seriously bad and should be replaced before you lose everything on it.

If all this sounds pretty gory it's because, well, it is. But if you're careful, you should not have to lose any files or even go through an entire reinstallation of the OS and your applications. (BTW, I've got a similar sort of operation planned in the near future. Not because of bad blocks but because of a spin-up problem. Can't say I'm looking forward to it.)

Good luck...

inaki 03-24-2005 08:46 PM

Thank You very much... i appreciate of your information given... Thanks buddy


All times are GMT -5. The time now is 07:34 PM.