Connection to RAID being lost
Hello
Problem: About every two weeks our system seems to lose its connection to its RAID. Attempting to to an ls hangs. The system is responsive except for access to the RAID I am running RHEL 5 on an SGI 450 IA64. I have two FC connections to a Silkworm 200E Brocade. The Brocade is in plugged into an SGI TP9500 RAID. The FC cards are LSIFC949X Both RAID and brocade report they are ok. When we loose connectivity I see the following messages in the log files. Once this error occurs I can not reboot gracefully I have to power down and power up. Once the system is powered back up all seems ok for the next week or two. I'm guessing there is some kind of hicup in the FC connection to the RAID but it does not recover. kernel: mptscsih:ioc2:attempting task abort! (sc=00006011878100) kernel: sd 2:0:6:4: kernel: command: Write(10): 2a 00 00 02 b7 b2 00 00 08 00 kernel: mptbase: Initiating ioc2 recovery kernel: rport 2:0-0: blocked FC remote port time out: saving binding kernel: rport 1:0-0: blocked FC remote port time out: saving binding kernel: rport 2:0-1: blocked FC remote port time out: saving binding kernel: rport 2:0-2: blocked FC remote port time out: saving binding kernel: rport 2:0-3: blocked FC remote port time out: saving binding kernel: rport 2:0-4: blocked FC remote port time out: saving binding kernel: rport 2:0-5: blocked FC remote port time out: saving binding kernel: rport 2:0-6: blocked FC remote port time out: saving binding kernel: rport 1:0-1: blocked FC remote port time out: saving binding kernel: rport 1:0-2: blocked FC remote port time out: saving binding kernel: rport 1:0-3: blocked FC remote port time out: saving binding kernel: rport 1:0-4: blocked FC remote port time out: saving binding kernel: rport 1:0-5: blocked FC remote port time out: saving binding kernel: rport 1:0-6: blocked FC remote port time out: saving binding sd 1:0:5:5: SCSI error: return code = 0x00010000 end request I/O error dev sdp sector 167988652 Buffer I/O error, dev sdx4, logical block 0 lost page write due to I/O error on sdx4 ... ... lots more errors like the above on sdp and sdx |
Quote:
|
That was my first thought. Unfortunately no. Its not on a completely timed basis. Sometimes its 10 days sometimes 18 days and everywhere in between. It does happen at times of heavy writes. From what I've read what I think it happening is the FC connection is in heavy use, gets reset but does not come back quite completely and my mounted filesystems get hosed.
|
Quote:
|
I don't know. I will check it out.
|
All times are GMT -5. The time now is 11:03 PM. |