LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 06-08-2015, 08:30 AM   #1
TronCarter
Member
 
Registered: Oct 2009
Posts: 36

Rep: Reputation: 0
Qlogic Infiniband Problems - Group Rate of 6 is too high


I have a 10 node cluster currently operating on a Qlogic SilverStorm 9024. Everything works fine on the existing nodes. I am attempting to add three new nodes to the mix and am having some issues getting the infiniband cards to work on the switch.

For what it's worth, the existing nodes are running Solaris 10 and using Mellanox Technologies MT26428 or MT25418 cards.

The new nodes are running CentOS 7.1 and Mellanox Technologies MT25204 cards.

When cabled up and running, I get occasional lights on the card and a blue light on the switch. I set an IP for the card and can ping it, but can't ping any other infiniband cards on the subnet. Everything appears to be working properly on the node end with the exception of the link being down in 'ip link'.

Logging into the swtich CLI, I get this error:
Code:
ESM: Embedded SM Error: sa_McMemberRecord_Set: Group Rate of 6 is too hi
gh for requested rate of 3, rate selector of 2, and port rate of 3 for group 0xf
f12401bffff0000:0000000000000001 in request from SOL11 mthca0, Port 0x0002c9020
02931cd, LID 0x000C, returning status 0x0200 : 0
I'm not sure where to go from here. I contacted Qlogic, but they referred me to Intel who is now supporting Infiniband. I haven't heard back from Intel.

Anyone have any ideas?
 
Old 06-08-2015, 02:27 PM   #2
nini09
Senior Member
 
Registered: Apr 2009
Posts: 1,860

Rep: Reputation: 162Reputation: 162
What's output, ibstat, ibhosts and ibswitchs?
 
Old 06-09-2015, 08:35 AM   #3
TronCarter
Member
 
Registered: Oct 2009
Posts: 36

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by nini09 View Post
What's output, ibstat, ibhosts and ibswitchs?
Those aren't installed by default with the CentOS support for infiniband. Here is what it looks like with the CentOS driver installed (extra ifconfig/ip link info for ethernet ports removed):

Code:
root@SOL11:/# ifconfig

ib0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 2044
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
        infiniband 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@SOL11:/# ibstat
bash: ibstat: command not found...
root@SOL11:/# ibhosts
bash: ibhosts: command not found...
root@SOL11:/# ibswitches
bash: ibswitches: command not found...

root@SOL11:/# ifconfig ib0 192.168.169.111
root@SOL11:/# ifconfig ib0 up
root@SOL11:/# ifconfig

ib0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 2044
        inet 192.168.169.111  netmask 255.255.255.0  broadcast 192.168.169.255
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
        infiniband 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@SOL11:/# ip link
6: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast state DOWN mode DEFAULT qlen 256
    link/infiniband 80:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:02:00:29:31:cd brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
root@SOL11:/# ping 192.168.169.111
PING 192.168.169.111 (192.168.169.111) 56(84) bytes of data.
64 bytes from 192.168.169.111: icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from 192.168.169.111: icmp_seq=2 ttl=64 time=0.030 ms
64 bytes from 192.168.169.111: icmp_seq=3 ttl=64 time=0.031 ms
64 bytes from 192.168.169.111: icmp_seq=4 ttl=64 time=0.031 ms
^C
--- 192.168.169.111 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.030/0.034/0.046/0.008 ms
root@SOL11:/# ping 192.168.169.110
PING 192.168.169.110 (192.168.169.110) 56(84) bytes of data.
From 192.168.169.111 icmp_seq=1 Destination Host Unreachable
From 192.168.169.111 icmp_seq=2 Destination Host Unreachable
From 192.168.169.111 icmp_seq=3 Destination Host Unreachable
From 192.168.169.111 icmp_seq=4 Destination Host Unreachable
^C
So, I can ping the adapter's address, but not any other adapters on the subnet.

If I install the Mellanox driver, it removes the CentOS driver and then the card doesn't show up under ifconfig, but the ib commands work:

Code:
root@SOL11:/# ibstat
CA 'mthca0'
        CA type: MT25204
        Number of ports: 1
        Firmware version: 1.2.0
        Hardware version: a0
        Node GUID: 0x0002c902002931cc
        System image GUID: 0x0002c902002931cf
        Port 1:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0002c902002931cd
                Link layer: InfiniBand
root@SOL11:/# ibhosts
Ca      : 0x0002c902002931cc ports 1 "SOL11 HCA-1"
root@SOL11:/# ibswitches
root@SOL11:/#
I'm not sure I'm assigning the IP address correctly in the first code block, and I don't know how to assign one to an adapter that doesn't exist in ifconfig in the second code block.
 
Old 06-09-2015, 02:51 PM   #4
nini09
Senior Member
 
Registered: Apr 2009
Posts: 1,860

Rep: Reputation: 162Reputation: 162
How do you install Infiniband driver in CentOS?
After a fresh CentOS install, do you execute, sudo yum groupinstall "Infiniband Support"?
 
Old 06-09-2015, 03:04 PM   #5
TronCarter
Member
 
Registered: Oct 2009
Posts: 36

Original Poster
Rep: Reputation: 0
I checked the box "Infiniband Support" during installation.
 
Old 06-10-2015, 03:00 PM   #6
nini09
Senior Member
 
Registered: Apr 2009
Posts: 1,860

Rep: Reputation: 162Reputation: 162
Run following command to install ibstat and so on tool.
yum install infiniband-diags
yum install perftest
 
Old 06-11-2015, 07:33 AM   #7
TronCarter
Member
 
Registered: Oct 2009
Posts: 36

Original Poster
Rep: Reputation: 0
Code:
root@SOL11:/# ibstat
CA 'mthca0'
        CA type: MT25204
        Number of ports: 1
        Firmware version: 1.2.0
        Hardware version: a0
        Node GUID: 0x0002c902002931cc
        System image GUID: 0x0002c902002931cf
        Port 1:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02590a68
                Port GUID: 0x0002c902002931cd
                Link layer: InfiniBand
root@SOL11:/# ibhosts
Ca      : 0x0002c902002931cc ports 1 "SOL11 mthca0"
root@SOL11:/# ibswitches
root@SOL11:/#
 
Old 06-11-2015, 10:41 AM   #8
TronCarter
Member
 
Registered: Oct 2009
Posts: 36

Original Poster
Rep: Reputation: 0
I seem to be able to configure an IP by using:

Code:
ifconfig ib0 192.168.169.111/24
But it doesn't persist through reboots. I have tried several examples of creating a /etc/sysconfig/network-scripts/ifcfg-ib0 file, but can't seem to get one that makes it happy.

Code:
root@SOL11:/etc/sysconfig/network-scripts# systemctl status -l network.service
network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network)
   Active: failed (Result: exit-code) since Thu 2015-06-11 10:11:36 EDT; 2s ago
  Process: 3213 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)

Jun 11 10:11:36 SOL11 network[3213]: RTNETLINK answers: File exists
Jun 11 10:11:36 SOL11 network[3213]: RTNETLINK answers: File exists
Jun 11 10:11:36 SOL11 network[3213]: RTNETLINK answers: File exists
Jun 11 10:11:36 SOL11 network[3213]: RTNETLINK answers: File exists
Jun 11 10:11:36 SOL11 network[3213]: RTNETLINK answers: File exists
Jun 11 10:11:36 SOL11 network[3213]: RTNETLINK answers: File exists
Jun 11 10:11:36 SOL11 network[3213]: RTNETLINK answers: File exists
Jun 11 10:11:36 SOL11 systemd[1]: network.service: control process exited, code=exited status=1
Jun 11 10:11:36 SOL11 systemd[1]: Failed to start LSB: Bring up/down networking.
Jun 11 10:11:36 SOL11 systemd[1]: Unit network.service entered failed state.
 
Old 06-11-2015, 02:23 PM   #9
nini09
Senior Member
 
Registered: Apr 2009
Posts: 1,860

Rep: Reputation: 162Reputation: 162
The infiniband link isn't up. Check cable and LED.
Make sure that it is connected properly by checking that the cable connectors are fully inserted into the ports.
Green LEDs, when lit, indicate that the physical link is up. Yellow LEDs, when lit, indicate that the logical link is up, and their blinking rate indicates link activity.
If a green LED is not lit, swap the cable with a cable that is known to work properly and test again. If it still fails, try the cable on another switch port. If it works, then the former switch port may be out of order or disabled by the system administrator.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
HDD high I/O rate? cmd-line34 Linux - Newbie 3 03-15-2014 06:12 PM
High Refresh Rate esfahan2000 Linux - Hardware 14 10-04-2012 06:20 AM
debian refresh rate too high foothead Linux - Newbie 4 12-01-2009 07:40 PM
Linux - rate of change too high? paulsiu Linux - General 4 07-02-2007 03:07 PM
refresh rate is to high ethanlindsey Linux - Hardware 1 09-23-2004 06:15 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 06:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration