LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 04-03-2015, 10:26 AM   #1
MQMan
Member
 
Registered: Jan 2004
Location: Los Angeles
Distribution: Slack64 14.1
Posts: 581

Rep: Reputation: 38
IBM M1015 (LSI 9211-8i) Drops and Re-Allocates LUNs


I just had a SuperMicro X9SCL server board die on me and it was replaced by an Asus P8B-X server board. Since then I've been having strange issues with my IBM M1015 controller. The controller has been cross flashed with the LSI 9211-81 firmware running in IT mode. In the SuperMicro board, it had been running for a couple of years with no issues at all.

Basically, the controller will throw a DID_NOT_CONNECT error on one or more LUNs. This forces the LUN to be dropped and then immediately reallocates to a different one, so in effect a drive previously allocated to /dev/sde suddenly becomes /dev/sdi.
Code:
Apr  2 22:49:39 zentyal kernel: [110351.190226] sd 4:0:3:0: [sde] Synchronizing SCSI cache
Apr  2 22:49:39 zentyal kernel: [110351.190253] sd 4:0:3:0: [sde]
Apr  2 22:49:39 zentyal kernel: [110351.190255] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Apr  2 22:49:39 zentyal kernel: [110351.190990] mpt2sas0: removing handle(0x000c), sas_addr(0x4433221105000000)
Apr  2 22:49:39 zentyal kernel: [110351.488933] sd 4:0:5:0: [sdg] Synchronizing SCSI cache
Apr  2 22:49:39 zentyal kernel: [110351.488959] sd 4:0:5:0: [sdg]
Apr  2 22:49:39 zentyal kernel: [110351.488961] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Apr  2 22:49:39 zentyal kernel: [110351.489066] mpt2sas0: removing handle(0x000e), sas_addr(0x4433221106000000)
Apr  2 22:49:42 zentyal kernel: [110354.194294] sd 4:0:6:0: [sdh] Synchronizing SCSI cache
Apr  2 22:49:42 zentyal kernel: [110354.194322] sd 4:0:6:0: [sdh]
Apr  2 22:49:42 zentyal kernel: [110354.194323] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Apr  2 22:49:42 zentyal kernel: [110354.195048] mpt2sas0: removing handle(0x000f), sas_addr(0x4433221107000000)
Apr  2 22:50:14 zentyal kernel: [110386.383123] scsi 4:0:7:0: Direct-Access     ATA      ST2000DL003-9VT1 CC32 PQ: 0 ANSI: 6
Apr  2 22:50:14 zentyal kernel: [110386.383136] scsi 4:0:7:0: SATA: handle(0x000f), sas_addr(0x4433221107000000), phy(7), device_name(0x0000000000000000)
Apr  2 22:50:14 zentyal kernel: [110386.383139] scsi 4:0:7:0: SATA: enclosure_logical_id(0x500605b0047955a0), slot(4)
Apr  2 22:50:14 zentyal kernel: [110386.383350] scsi 4:0:7:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Apr  2 22:50:14 zentyal kernel: [110386.383358] scsi 4:0:7:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Apr  2 22:50:14 zentyal kernel: [110386.383551] sd 4:0:7:0: Attached scsi generic sg4 type 0
Apr  2 22:50:14 zentyal kernel: [110386.384518] sd 4:0:7:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Apr  2 22:50:14 zentyal kernel: [110386.421347] sd 4:0:7:0: [sdi] Write Protect is off
Apr  2 22:50:14 zentyal kernel: [110386.421353] sd 4:0:7:0: [sdi] Mode Sense: 7f 00 10 08
Apr  2 22:50:14 zentyal kernel: [110386.433602] sd 4:0:7:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr  2 22:50:14 zentyal kernel: [110386.510942]  sdi: sdi1
Apr  2 22:50:14 zentyal kernel: [110386.609074] sd 4:0:7:0: [sdi] Attached SCSI disk
Apr  2 22:50:17 zentyal kernel: [110389.881697] scsi 4:0:8:0: Direct-Access     ATA      ST2000DL003-9VT1 CC32 PQ: 0 ANSI: 6
Apr  2 22:50:17 zentyal kernel: [110389.881709] scsi 4:0:8:0: SATA: handle(0x000c), sas_addr(0x4433221105000000), phy(5), device_name(0x0000000000000000)
Apr  2 22:50:17 zentyal kernel: [110389.881712] scsi 4:0:8:0: SATA: enclosure_logical_id(0x500605b0047955a0), slot(6)
Apr  2 22:50:17 zentyal kernel: [110389.881908] scsi 4:0:8:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Apr  2 22:50:17 zentyal kernel: [110389.881914] scsi 4:0:8:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Apr  2 22:50:17 zentyal kernel: [110389.882066] sd 4:0:8:0: Attached scsi generic sg6 type 0
Apr  2 22:50:17 zentyal kernel: [110389.884018] sd 4:0:8:0: [sdj] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Apr  2 22:50:17 zentyal kernel: [110389.914465] sd 4:0:8:0: [sdj] Write Protect is off
Apr  2 22:50:17 zentyal kernel: [110389.914471] sd 4:0:8:0: [sdj] Mode Sense: 7f 00 10 08
Apr  2 22:50:17 zentyal kernel: [110389.926766] sd 4:0:8:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr  2 22:50:18 zentyal kernel: [110390.007076]  sdj: sdj1
Apr  2 22:50:18 zentyal kernel: [110390.113746] sd 4:0:8:0: [sdj] Attached SCSI disk
Apr  2 22:50:19 zentyal kernel: [110391.631101] scsi 4:0:9:0: Direct-Access     ATA      ST2000DL003-9VT1 CC32 PQ: 0 ANSI: 6
Apr  2 22:50:19 zentyal kernel: [110391.631119] scsi 4:0:9:0: SATA: handle(0x000e), sas_addr(0x4433221106000000), phy(6), device_name(0x0000000000000000)
Apr  2 22:50:19 zentyal kernel: [110391.631121] scsi 4:0:9:0: SATA: enclosure_logical_id(0x500605b0047955a0), slot(5)
Apr  2 22:50:19 zentyal kernel: [110391.631260] scsi 4:0:9:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Apr  2 22:50:19 zentyal kernel: [110391.631267] scsi 4:0:9:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Apr  2 22:50:19 zentyal kernel: [110391.631423] sd 4:0:9:0: Attached scsi generic sg7 type 0
Apr  2 22:50:19 zentyal kernel: [110391.632100] sd 4:0:9:0: [sdk] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Apr  2 22:50:19 zentyal kernel: [110391.661327] sd 4:0:9:0: [sdk] Write Protect is off
Apr  2 22:50:19 zentyal kernel: [110391.661333] sd 4:0:9:0: [sdk] Mode Sense: 7f 00 10 08
Apr  2 22:50:19 zentyal kernel: [110391.673534] sd 4:0:9:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr  2 22:50:19 zentyal kernel: [110391.752466]  sdk: sdk1
Apr  2 22:50:19 zentyal kernel: [110391.850195] sd 4:0:9:0: [sdk] Attached SCSI disk
Obviously this then causes havoc as the system still tries to communicate with /dev/sde.
Code:
Apr  2 22:59:09 zentyal kernel: [110920.899112] XFS (sde1): metadata I/O error: block 0x1ee2f70 ("xfs_trans_read_buf_map") error 19 numblks 16
Apr  2 22:59:09 zentyal kernel: [110920.899128] XFS (sde1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 19.
Apr  2 22:59:09 zentyal kernel: [110920.899970] XFS (sde1): metadata I/O error: block 0xeda8c0 ("xfs_trans_read_buf_map") error 19 numblks 16
Apr  2 22:59:09 zentyal kernel: [110920.899975] XFS (sde1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 19.
Apr  2 22:59:09 zentyal kernel: [110920.899996] XFS (sde1): metadata I/O error: block 0x1ee2f80 ("xfs_trans_read_buf_map") error 19 numblks 16
Apr  2 22:59:09 zentyal kernel: [110920.899999] XFS (sde1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 19.
Apr  2 22:59:09 zentyal kernel: [110920.900017] XFS (sde1): metadata I/O error: block 0x1ee2f70 ("xfs_trans_read_buf_map") error 19 numblks 16
Apr  2 22:59:09 zentyal kernel: [110920.900020] XFS (sde1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 19.
Apr  2 22:59:09 zentyal kernel: [110920.900054] XFS (sde1): metadata I/O error: block 0x1ee2f70 ("xfs_trans_read_buf_map") error 19 numblks 16
Apr  2 22:59:09 zentyal kernel: [110920.900057] XFS (sde1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 19.
It's not always the same LUNs/disks that this happens on and SMART doesn't report any issues with any of the 7 drives attached to the controller.

This is a Zentyal server 3.5, which is basically Ubuntu Server LTS 14.04.

Any thoughts on this. Could a motherboard swap cause this, or is that just coincidence.

Cheers.

Last edited by MQMan; 04-15-2015 at 12:29 PM.
 
Old 04-03-2015, 12:27 PM   #2
zuikway
LQ Newbie
 
Registered: Oct 2009
Posts: 9

Rep: Reputation: 2
A couple of questions, as I have had problems myself with the 9211. Which firmware version? The latest is 20 http://www.lsi.com/products/host-bus....aspx#tab/tab4

Also, was the previous board and this board using the full 8x PCIe lanes?

I have had MB compatibility issues with some LSI controllers, depending on firmware version.
 
Old 04-03-2015, 01:10 PM   #3
MQMan
Member
 
Registered: Jan 2004
Location: Los Angeles
Distribution: Slack64 14.1
Posts: 581

Original Poster
Rep: Reputation: 38
When I was switching the motherboards, I did flash to the latest (20) firmware. However, I started getting all sorts of weird I/O errors. Not the ones reported here, others. After some Google-fu I found references that the firmware version should match the driver version. As the driver version is 16:
Code:
mpt2sas version 16.100.00.00 loaded
I reflashed back to firmware 16. This resolved the previous errors, but now I started to see this issue occasionally. Because even 16 was higher than I was previously using, I went back to the firmware that had been in the board for the previous 2 years, without issue:
Code:
mpt2sas0: LSISAS2008: FWVersion(14.00.01.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
For a while I thought that had fixed it, as I didn't see any other occurrences for around 10 days, where previously the longest I'd gone was 2, maybe 3, days. But last night. Hence the reason for the post.

Previous board was a true x8 link, x8 slot. On this board, originally it was in an x4 link, x8 slot, now it's in an x16 link, x16 slot, and have had the issue in both. There isn't an x8 link on the board.

Cheers.
 
1 members found this post helpful.
Old 04-03-2015, 01:36 PM   #4
zuikway
LQ Newbie
 
Registered: Oct 2009
Posts: 9

Rep: Reputation: 2
Your info may help me also. I have had to RMA this particular controller several times. It would go for 3-4 months then quit. I have a fan that blows directly on the heat sink. I also removed the heat sink and replaced the sticky heat pad with heat sink compound. This chip gets hot. If it has ever been hit with static, it may appear fine, but ESD becomes leaky when hit, and even more when hot.

One other question, this card is PCIe 2 and it may have problems with PCIe 3 slots. I could not see which PCIe version the P8B-X has. Perhaps trying a different slot, even a 1X slot. Your board may be failing or having hardware issues, but that is just a guess.

I have had much better luck with the 9207-8i, which is PCIe 3.0, in 3.0 PCIe slots running Debian Jessie.
 
Old 04-03-2015, 02:28 PM   #5
MQMan
Member
 
Registered: Jan 2004
Location: Los Angeles
Distribution: Slack64 14.1
Posts: 581

Original Poster
Rep: Reputation: 38
Both boards are PCIe 2 and in fact both use the same controller, an Intel C202. The card is also PCIe 2.

Prior to putting the MB into this server, it was previously used as an ESXi host with an LSI 9260-8i MegaRAID card in the 4/8 slot with no issues.

I can't use the 1x slot, as it's a closed end 1x form factor, so the card physically won't fit.

Cheers.
 
Old 04-03-2015, 07:31 PM   #6
zuikway
LQ Newbie
 
Registered: Oct 2009
Posts: 9

Rep: Reputation: 2
I wish I could be of more help. Please advise anything you find out.

One more question, when you updated the flash, did you do step 13 in this link:
https://forums.freenas.org/index.php...s9240-8i.8632/

this requires the number off the green tag:
namely sas2flsh -o -sasadd 500605b*****
Not sure what this does myself.
 
Old 04-04-2015, 01:23 PM   #7
MQMan
Member
 
Registered: Jan 2004
Location: Los Angeles
Distribution: Slack64 14.1
Posts: 581

Original Poster
Rep: Reputation: 38
Quote:
Originally Posted by zuikway View Post
I wish I could be of more help. Please advise anything you find out.

One more question, when you updated the flash, did you do step 13 in this link:
https://forums.freenas.org/index.php...s9240-8i.8632/

this requires the number off the green tag:
namely sas2flsh -o -sasadd 500605b*****
Not sure what this does myself.
If you're cross-flashing to a different controller then the empty.bin wipes everything out, including the embedded SAS ID. That command just restores it back. If you're only flashing between firmware versions, then the ID isn't touched, so that command isn't necessary.

Cheers.
 
Old 04-15-2015, 12:27 PM   #8
MQMan
Member
 
Registered: Jan 2004
Location: Los Angeles
Distribution: Slack64 14.1
Posts: 581

Original Poster
Rep: Reputation: 38
OK, I think I've gotten to the bottom of this and it was (kinda) the motherboard swap that caused it.

I noticed that even though it wasn't always the same LUN/disk that triggered this, it was always one of four, not all seven drives. This led me to switch around the 2 breakout cables on the card to see if the problem moved to other drives.

Since doing this I haven't seen any issues in almost 2 weeks. So, I'm guessing that either the card or one of the breakout cables wasn't fully seated.

One other piece that also kinda confirms this, is the SMART reports from the drives. They all have a high value for attribute 199: UDMA_CRC_Error_Count.

Cheers.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to migrate from DM-MULTIPATH to LSI/IBM RDAC Imprecator Linux - Enterprise 1 10-03-2011 10:31 AM
[SOLVED] Where does the Kmalloc/Vmalloc/get_free_pages/malloc Allocates memory? manikumar086 Linux - Newbie 6 07-30-2011 01:34 PM
How Linux allocates memory for malloc in a program johnarg Linux - Newbie 2 06-25-2006 07:15 PM
Trying to install Debian (woody) on IBM x335 w/LSI u320 JIZ Linux - Hardware 1 10-07-2004 03:14 PM
DHCP allocates two machines the same IP Address jesternb Linux - Networking 6 10-04-2002 10:57 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 03:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration