Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Problem: About every two weeks our system seems to lose its connection to its RAID. Attempting to to an ls hangs. The system is responsive except for access to the RAID
I am running RHEL 5 on an SGI 450 IA64. I have two FC connections to a Silkworm 200E Brocade. The Brocade is in plugged into an SGI TP9500 RAID. The FC cards are LSIFC949X
Both RAID and brocade report they are ok.
When we loose connectivity I see the following messages in the log files. Once this error occurs I can not reboot gracefully I have to power down and power up. Once the system is powered back up all seems ok for the next week or two. I'm guessing there is some kind of hicup in the FC connection to the RAID but it does not recover.
kernel: mptscsih:ioc2:attempting task abort! (sc=00006011878100)
kernel: sd 2:0:6:4:
kernel: command: Write(10): 2a 00 00 02 b7 b2 00 00 08 00
kernel: mptbase: Initiating ioc2 recovery
kernel: rport 2:0-0: blocked FC remote port time out: saving binding
kernel: rport 1:0-0: blocked FC remote port time out: saving binding
kernel: rport 2:0-1: blocked FC remote port time out: saving binding
kernel: rport 2:0-2: blocked FC remote port time out: saving binding
kernel: rport 2:0-3: blocked FC remote port time out: saving binding
kernel: rport 2:0-4: blocked FC remote port time out: saving binding
kernel: rport 2:0-5: blocked FC remote port time out: saving binding
kernel: rport 2:0-6: blocked FC remote port time out: saving binding
kernel: rport 1:0-1: blocked FC remote port time out: saving binding
kernel: rport 1:0-2: blocked FC remote port time out: saving binding
kernel: rport 1:0-3: blocked FC remote port time out: saving binding
kernel: rport 1:0-4: blocked FC remote port time out: saving binding
kernel: rport 1:0-5: blocked FC remote port time out: saving binding
kernel: rport 1:0-6: blocked FC remote port time out: saving binding
sd 1:0:5:5: SCSI error: return code = 0x00010000
end request I/O error dev sdp sector 167988652
Buffer I/O error, dev sdx4, logical block 0
lost page write due to I/O error on sdx4
...
...
lots more errors like the above on sdp and sdx
Problem: About every two weeks our system seems to lose its connection to its RAID. Attempting to to an ls hangs. The system is responsive except for access to the RAID
I am running RHEL 5 on an SGI 450 IA64. I have two FC connections to a Silkworm 200E Brocade. The Brocade is in plugged into an SGI TP9500 RAID. The FC cards are LSIFC949X
Both RAID and brocade report they are ok.
When we loose connectivity I see the following messages in the log files. Once this error occurs I can not reboot gracefully I have to power down and power up. Once the system is powered back up all seems ok for the next week or two. I'm guessing there is some kind of hicup in the FC connection to the RAID but it does not recover.
lots more errors like the above on sdp and sdx
I've seen this happen before when my SAN guys are doing 'behind-the-scenes' things, and have had flaky things happen. Don't know if that's the case here, though. Are there any copy/mirror jobs, like doing a BCV snapshot, that occur with some frequency?
That was my first thought. Unfortunately no. Its not on a completely timed basis. Sometimes its 10 days sometimes 18 days and everywhere in between. It does happen at times of heavy writes. From what I've read what I think it happening is the FC connection is in heavy use, gets reset but does not come back quite completely and my mounted filesystems get hosed.
That was my first thought. Unfortunately no. Its not on a completely timed basis. Sometimes its 10 days sometimes 18 days and everywhere in between. It does happen at times of heavy writes. From what I've read what I think it happening is the FC connection is in heavy use, gets reset but does not come back quite completely and my mounted filesystems get hosed.
Perhaps the firmware on the Brocade needs to be updated...are you on the latest release?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.