LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 02-23-2010, 09:48 AM   #1
thllgo
Member
 
Registered: Sep 2003
Location: Laurel MD
Posts: 296

Rep: Reputation: 32
Connection to RAID being lost


Hello

Problem: About every two weeks our system seems to lose its connection to its RAID. Attempting to to an ls hangs. The system is responsive except for access to the RAID

I am running RHEL 5 on an SGI 450 IA64. I have two FC connections to a Silkworm 200E Brocade. The Brocade is in plugged into an SGI TP9500 RAID. The FC cards are LSIFC949X

Both RAID and brocade report they are ok.

When we loose connectivity I see the following messages in the log files. Once this error occurs I can not reboot gracefully I have to power down and power up. Once the system is powered back up all seems ok for the next week or two. I'm guessing there is some kind of hicup in the FC connection to the RAID but it does not recover.

kernel: mptscsih:ioc2:attempting task abort! (sc=00006011878100)
kernel: sd 2:0:6:4:
kernel: command: Write(10): 2a 00 00 02 b7 b2 00 00 08 00
kernel: mptbase: Initiating ioc2 recovery
kernel: rport 2:0-0: blocked FC remote port time out: saving binding
kernel: rport 1:0-0: blocked FC remote port time out: saving binding
kernel: rport 2:0-1: blocked FC remote port time out: saving binding
kernel: rport 2:0-2: blocked FC remote port time out: saving binding
kernel: rport 2:0-3: blocked FC remote port time out: saving binding
kernel: rport 2:0-4: blocked FC remote port time out: saving binding
kernel: rport 2:0-5: blocked FC remote port time out: saving binding
kernel: rport 2:0-6: blocked FC remote port time out: saving binding
kernel: rport 1:0-1: blocked FC remote port time out: saving binding
kernel: rport 1:0-2: blocked FC remote port time out: saving binding
kernel: rport 1:0-3: blocked FC remote port time out: saving binding
kernel: rport 1:0-4: blocked FC remote port time out: saving binding
kernel: rport 1:0-5: blocked FC remote port time out: saving binding
kernel: rport 1:0-6: blocked FC remote port time out: saving binding
sd 1:0:5:5: SCSI error: return code = 0x00010000
end request I/O error dev sdp sector 167988652
Buffer I/O error, dev sdx4, logical block 0
lost page write due to I/O error on sdx4
...
...
lots more errors like the above on sdp and sdx
 
Old 02-23-2010, 11:21 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,613

Rep: Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962
Quote:
Originally Posted by thllgo View Post
Hello

Problem: About every two weeks our system seems to lose its connection to its RAID. Attempting to to an ls hangs. The system is responsive except for access to the RAID

I am running RHEL 5 on an SGI 450 IA64. I have two FC connections to a Silkworm 200E Brocade. The Brocade is in plugged into an SGI TP9500 RAID. The FC cards are LSIFC949X

Both RAID and brocade report they are ok.

When we loose connectivity I see the following messages in the log files. Once this error occurs I can not reboot gracefully I have to power down and power up. Once the system is powered back up all seems ok for the next week or two. I'm guessing there is some kind of hicup in the FC connection to the RAID but it does not recover.

lots more errors like the above on sdp and sdx
I've seen this happen before when my SAN guys are doing 'behind-the-scenes' things, and have had flaky things happen. Don't know if that's the case here, though. Are there any copy/mirror jobs, like doing a BCV snapshot, that occur with some frequency?
 
Old 02-23-2010, 12:07 PM   #3
thllgo
Member
 
Registered: Sep 2003
Location: Laurel MD
Posts: 296

Original Poster
Rep: Reputation: 32
That was my first thought. Unfortunately no. Its not on a completely timed basis. Sometimes its 10 days sometimes 18 days and everywhere in between. It does happen at times of heavy writes. From what I've read what I think it happening is the FC connection is in heavy use, gets reset but does not come back quite completely and my mounted filesystems get hosed.
 
Old 02-23-2010, 03:10 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,613

Rep: Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962Reputation: 7962
Quote:
Originally Posted by thllgo View Post
That was my first thought. Unfortunately no. Its not on a completely timed basis. Sometimes its 10 days sometimes 18 days and everywhere in between. It does happen at times of heavy writes. From what I've read what I think it happening is the FC connection is in heavy use, gets reset but does not come back quite completely and my mounted filesystems get hosed.
Perhaps the firmware on the Brocade needs to be updated...are you on the latest release?
 
Old 02-23-2010, 04:14 PM   #5
thllgo
Member
 
Registered: Sep 2003
Location: Laurel MD
Posts: 296

Original Poster
Rep: Reputation: 32
I don't know. I will check it out.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
ssh: connection to host port: 22: Connection timed out lost connection cucolin@ Linux - Server 4 11-22-2011 06:15 AM
Lost files after rebuilding a RAID 1 alderfc7 Linux - Server 4 09-30-2009 05:23 PM
wireless connection problems & now lost lan connection & network manager wont work Rainbowserpant Linux - Wireless Networking 2 09-02-2009 04:00 AM
Hardware RAID 5 lost single disk joechancellor Linux - Newbie 4 02-24-2009 10:17 PM
Raid Problem Fedora Core 3, RAID LOST DISKS ALWAYS icatalan Linux - Hardware 1 09-17-2005 03:14 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration