LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Issue installing ib0 for lustre (https://www.linuxquestions.org/questions/linux-networking-3/issue-installing-ib0-for-lustre-4175461051/)

your_shadow03 05-07-2013 01:31 PM

Issue installing ib0 for lustre
 
Hi,

I am trying to setup lustre environment but facing the issue related to
infiniband setup:

While trying to bring up ib0 I am running:

[root at slave3 ~]# /etc/rc.d/init.d/rdma restart
Unloading OpenIB kernel modules:
Found opensm running.
Please stop all RDMA applications before downing the stack.
[FAILED]
Loading OpenIB kernel modules:FATAL: Error inserting ib_addr
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_addr.ko):
Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module WARNING: Error inserting ib_core
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_core.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_mad
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_mad.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_sa
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_sa.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting iw_cm
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/iw_cm.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_cm
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_cm.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
FATAL: Error inserting rdma_cm
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/rdma_cm.ko):
Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module WARNING: Error inserting iw_cm
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/iw_cm.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_cm
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_cm.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting rdma_cm
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/rdma_cm.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
FATAL: Error inserting rdma_ucm
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/rdma_ucm.ko):
Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module FATAL: Error inserting ib_ipoib
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko):
Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module [FAILED]

FYI..

Though I checked that ibstatus and ibstat working fine:

[root at slave3 ~]# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 1
Firmware version: 2.9.1000
Hardware version: b0
Node GUID: 0x0002c903000be516
System image GUID: 0x0002c903000be519
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 7
LMC: 0
SM lid: 2
Capability mask: 0x0251086a
Port GUID: 0x0002c903000be517
Link layer: InfiniBand
[root at slave3 ~]#

[root at slave3 ~]# ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c903:000b:e517
base lid: 0x7
sm lid: 0x2
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
link_layer: InfiniBand

smallpond 05-07-2013 03:11 PM

Did you run dmesg and see what the error is?

your_shadow03 05-08-2013 04:28 AM

While I rebooted the machine and its up the dmesg was:
Code:

alloc irq_2_iommu on node -1
ioatdma 0000:00:04.1: irq 79 for MSI/MSI-X
ioatdma 0000:00:04.2: PCI INT C -> GSI 31 (level, low) -> IRQ 31
ioatdma 0000:00:04.2: setting latency timer to 64
  alloc irq_desc for 80 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ioatdma 0000:00:04.2: irq 80 for MSI/MSI-X
ioatdma 0000:00:04.3: PCI INT D -> GSI 39 (level, low) -> IRQ 39
ioatdma 0000:00:04.3: setting latency timer to 64
  alloc irq_desc for 81 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ioatdma 0000:00:04.3: irq 81 for MSI/MSI-X
ioatdma 0000:00:04.4: PCI INT A -> GSI 31 (level, low) -> IRQ 31
ioatdma 0000:00:04.4: setting latency timer to 64
  alloc irq_desc for 82 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ioatdma 0000:00:04.4: irq 82 for MSI/MSI-X
ioatdma 0000:00:04.5: PCI INT B -> GSI 39 (level, low) -> IRQ 39
ioatdma 0000:00:04.5: setting latency timer to 64
  alloc irq_desc for 83 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ioatdma 0000:00:04.5: irq 83 for MSI/MSI-X
ioatdma 0000:00:04.6: PCI INT C -> GSI 31 (level, low) -> IRQ 31
ioatdma 0000:00:04.6: setting latency timer to 64
  alloc irq_desc for 84 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ioatdma 0000:00:04.6: irq 84 for MSI/MSI-X
ioatdma 0000:00:04.7: PCI INT D -> GSI 39 (level, low) -> IRQ 39
ioatdma 0000:00:04.7: setting latency timer to 64
  alloc irq_desc for 85 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ioatdma 0000:00:04.7: irq 85 for MSI/MSI-X
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
mlx4_core: Initializing 0000:05:00.0
mlx4_core 0000:05:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32
mlx4_core 0000:05:00.0: setting latency timer to 64
udev: renamed network interface eth0 to em1
  alloc irq_desc for 86 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 86 for MSI/MSI-X
  alloc irq_desc for 87 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 87 for MSI/MSI-X
  alloc irq_desc for 88 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 88 for MSI/MSI-X
  alloc irq_desc for 89 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 89 for MSI/MSI-X
  alloc irq_desc for 90 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 90 for MSI/MSI-X
  alloc irq_desc for 91 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 91 for MSI/MSI-X
  alloc irq_desc for 92 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 92 for MSI/MSI-X
  alloc irq_desc for 93 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 93 for MSI/MSI-X
  alloc irq_desc for 94 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 94 for MSI/MSI-X
  alloc irq_desc for 95 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 95 for MSI/MSI-X
  alloc irq_desc for 96 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 96 for MSI/MSI-X
  alloc irq_desc for 97 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 97 for MSI/MSI-X
  alloc irq_desc for 98 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
mlx4_core 0000:05:00.0: irq 98 for MSI/MSI-X
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
iTCO_vendor_support: vendor-support=0
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.07rh
iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS
i801_smbus 0000:00:1f.3: PCI INT C -> GSI 19 (level, low) -> IRQ 19
ACPI: resource 0000:00:1f.3 [io  0x4000-0x401f] conflicts with ACPI region SMBI [io 0x4000-0x400f]
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
EDAC MC: Ver: 2.1.0 Dec 14 2012
EDAC sbridge: Seeking for: dev 0e.0 PCI ID 8086:3ca0
EDAC sbridge: Seeking for: dev 0e.0 PCI ID 8086:3ca0
EDAC sbridge: Seeking for: dev 0f.0 PCI ID 8086:3ca8
EDAC sbridge: Seeking for: dev 0f.0 PCI ID 8086:3ca8
EDAC sbridge: Seeking for: dev 0f.1 PCI ID 8086:3c71
EDAC sbridge: Seeking for: dev 0f.1 PCI ID 8086:3c71
EDAC sbridge: Seeking for: dev 0f.2 PCI ID 8086:3caa
EDAC sbridge: Seeking for: dev 0f.2 PCI ID 8086:3caa
EDAC sbridge: Seeking for: dev 0f.3 PCI ID 8086:3cab
EDAC sbridge: Seeking for: dev 0f.3 PCI ID 8086:3cab
EDAC sbridge: Seeking for: dev 0f.4 PCI ID 8086:3cac
EDAC sbridge: Seeking for: dev 0f.4 PCI ID 8086:3cac
EDAC sbridge: Seeking for: dev 0f.5 PCI ID 8086:3cad
EDAC sbridge: Seeking for: dev 0f.5 PCI ID 8086:3cad
EDAC sbridge: Seeking for: dev 11.0 PCI ID 8086:3cb8
EDAC sbridge: Seeking for: dev 11.0 PCI ID 8086:3cb8
EDAC sbridge: Seeking for: dev 0c.6 PCI ID 8086:3cf4
EDAC sbridge: Seeking for: dev 0c.6 PCI ID 8086:3cf4
EDAC sbridge: Seeking for: dev 0c.7 PCI ID 8086:3cf6
EDAC sbridge: Seeking for: dev 0c.7 PCI ID 8086:3cf6
EDAC sbridge: Seeking for: dev 0d.6 PCI ID 8086:3cf5
EDAC sbridge: Seeking for: dev 0d.6 PCI ID 8086:3cf5
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:ff:0e.0
EDAC sbridge: Driver loaded.
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 6:0:0:0: Attached scsi CD-ROM sr0
microcode: CPU0 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
microcode: CPU1 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
microcode: CPU2 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
microcode: CPU3 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
microcode: CPU4 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
microcode: CPU5 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
microcode: CPU6 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
microcode: CPU7 sig=0x206d5, pf=0x1, revision=0x513
platform microcode: firmware: requesting intel-ucode/06-2d-05
Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.2)
sd 1:0:0:0: Attached scsi generic sg0 type 0
sr 6:0:0:0: Attached scsi generic sg1 type 5
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
Adding 4030456k swap on /dev/mapper/VolGroup-lv_swap.  Priority:-1 extents:1 across:4030456k
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
All bugs added by David S. Miller <davem@redhat.com>
8021q: adding VLAN 0 to HW filter on device eth1
cnic: Unknown symbol ip6_route_output
8021q: adding VLAN 0 to HW filter on device em1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
type=1305 audit(1368108959.718:48954): auid=4294967295 ses=4294967295 op="remove rule" key=(null) list=4 res=1
type=1305 audit(1368108959.718:48955): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
readahead-collector: sorting
readahead-collector: finished

While I run /etc/rc.d/init.d/rdma restart
Code:

[root@slave3 ~]# /etc/rc.d/init.d/rdma restart
Unloading OpenIB kernel modules:                          [  OK  ]
Loading OpenIB kernel modules:FATAL: Error inserting ib_addr (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_addr.ko): Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module WARNING: Error inserting ib_core (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_core.ko): Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_mad (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_mad.ko): Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_sa (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_sa.ko): Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting iw_cm (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/iw_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_cm (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg)
FATAL: Error inserting rdma_cm (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/rdma_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module WARNING: Error inserting iw_cm (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/iw_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting ib_cm (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/ib_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg)
WARNING: Error inserting rdma_cm (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/rdma_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg)
FATAL: Error inserting rdma_ucm (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/core/rdma_ucm.ko): Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module FATAL: Error inserting ib_ipoib (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko): Unknown symbol in module, or unknown parameter (see dmesg)

Failed to load module                                      [FAILED]
[root@slave3 ~]#

Code:

dmesg shows:

dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.2)
sd 1:0:0:0: Attached scsi generic sg0 type 0
sr 6:0:0:0: Attached scsi generic sg1 type 5
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
Adding 4030456k swap on /dev/mapper/VolGroup-lv_swap.  Priority:-1 extents:1 across:4030456k
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
All bugs added by David S. Miller <davem@redhat.com>
8021q: adding VLAN 0 to HW filter on device eth1
cnic: Unknown symbol ip6_route_output
8021q: adding VLAN 0 to HW filter on device em1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
type=1305 audit(1368108959.718:48954): auid=4294967295 ses=4294967295 op="remove rule" key=(null) list=4 res=1
type=1305 audit(1368108959.718:48955): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
readahead-collector: sorting
readahead-collector: finished
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
ib_addr: Unknown symbol ipv6_dev_get_saddr
ib_addr: Unknown symbol ip6_route_output
ib_addr: Unknown symbol ipv6_chk_addr
ib_addr: Unknown symbol ipv6_dev_get_saddr
ib_addr: Unknown symbol ip6_route_output
ib_addr: Unknown symbol ipv6_chk_addr
ib_addr: Unknown symbol ipv6_dev_get_saddr
ib_addr: Unknown symbol ip6_route_output
ib_addr: Unknown symbol ipv6_chk_addr
ib_ipoib: Unknown symbol icmpv6_send


smallpond 05-08-2013 08:08 AM

That's pretty clear. Your kernel seems to not have ipv6 configured. Either recompile the ib driver the same way or get a new kernel.

your_shadow03 05-08-2013 09:31 AM

smallpond,

How to enable ipv6 in the kernel level? How to do it?

smallpond 05-08-2013 09:50 AM

I think I'm wrong. Try just doing
Code:

modprobe ipv6


All times are GMT -5. The time now is 02:06 AM.