LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Solaris / OpenSolaris (http://www.linuxquestions.org/questions/solaris-opensolaris-20/)
-   -   Solaris 11 machine crashed, possible hardware issue? (http://www.linuxquestions.org/questions/solaris-opensolaris-20/solaris-11-machine-crashed-possible-hardware-issue-4175446936/)

alpha01 01-23-2013 01:48 PM

Solaris 11 machine crashed, possible hardware issue?
 
Hello,

I have a Solaris 11 machine that randomly crashed this morning. After physically restarting the machine, I noticed that all of the drives were marked with a "Sense Key: Soft_Error" both in dmesg and in /var/adm/messages.

Since all the drives on the machine were tagged with the same Soft Error, does this mean that the HBA is faulty?

Code:

root@solaris-machine:/var/log# iostat -E
sd0      Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product:      Revision: SN02 Serial No:
Size: 500.11GB <500107862016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 1
Illegal Request: 12 Predictive Failure Analysis: 0
sd2      Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product:      Revision: 0004 Serial No: 
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 1
Illegal Request: 0 Predictive Failure Analysis: 0
sd4      Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product:      Revision: 0004 Serial No: 
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 1
Illegal Request: 0 Predictive Failure Analysis: 0
sd5      Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product:      Revision: 0004 Serial No:
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 1
Illegal Request: 0 Predictive Failure Analysis: 0

Code:

Jan 23 10:45:02 solaris-machine scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g5000c5004dfae642 (sd4):
Jan 23 10:45:02 solaris-machine      Error for Command: <undecoded cmd 0xa1>    Error Level: Recovered
Jan 23 10:45:02 solaris-machine scsi: [ID 107833 kern.notice]        Requested Block: 0                        Error Block: 0
Jan 23 10:45:02 solaris-machine scsi: [ID 107833 kern.notice]        Vendor: ATA                                Serial Number:       
Jan 23 10:45:02 solaris-machine scsi: [ID 107833 kern.notice]        Sense Key: Soft_Error
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g5000c5004dfc8db2 (sd2):
Jan 23 10:45:04 solaris-machine      Error for Command: <undecoded cmd 0xa1>    Error Level: Recovered
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        Requested Block: 0                        Error Block: 0
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        Vendor: ATA                                Serial Number:       
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        Sense Key: Soft_Error
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g5000c5004dfd4ce3 (sd5):
Jan 23 10:45:04 solaris-machine      Error for Command: <undecoded cmd 0xa1>    Error Level: Recovered
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        Requested Block: 0                        Error Block: 0
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        Vendor: ATA                                Serial Number:
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        Sense Key: Soft_Error
Jan 23 10:45:04 solaris-machine scsi: [ID 107833 kern.notice]        ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Jan 23 10:45:07 solaris-machine scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15d9,664@1f,2/disk@0,0 (sd0):
Jan 23 10:45:07 solaris-machine      Error for Command: <undecoded cmd 0xa1>    Error Level: Recovered
Jan 23 10:45:07 solaris-machine scsi: [ID 107833 kern.notice]        Requested Block: 0                        Error Block: 0
Jan 23 10:45:07 solaris-machine scsi: [ID 107833 kern.notice]        Vendor: ATA                                Serial Number:
Jan 23 10:45:07 solaris-machine scsi: [ID 107833 kern.notice]        Sense Key: Soft_Error
Jan 23 10:45:07 solaris-machine scsi: [ID 107833 kern.notice]        ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0


DukeNuke2 01-24-2013 01:52 AM

What hardware do you use? I don't think of an HBA error right away. The interesting question is more like "why does the machine crash" in the first place... also, did you check the zpool/zfs status of the drives?

alpha01 01-24-2013 03:52 PM

I'm using standard x86 hardware.
Code:

ID    SIZE TYPE
1    113  SMB_TYPE_SYSTEM (system information)

  Manufacturer: Supermicro
  Product: X9DRH-7TF/7F/iTF/iF
  Version: 1234567890

I forgot to mentioned on my original post, I checked all zfs pools after the reboot and they all appeared to be in optimal condition. All drives on the other hand, had the Soft Error recoverable tagged on them.

DukeNuke2 01-25-2013 07:59 AM

i wouldn't give to much about the errors from iostst output... but again, what was the root cause of the crash?


All times are GMT -5. The time now is 09:50 AM.