Regarding Kernel issue when pinging to ECMP reachable destination using namespaces
There is a kernel issue which we are facing in Linux using namespaces when using with ECMP.
The topology namespace_topology.jpg is attached where each destination is reachable via 2 ECMP routes.
When ping from namespace NS 3 is initiated for 14.0.0.1, only 50% of ping is success. The problem is that, the ping echo request is reaching the destination interface sim4 but only 1 Echo reply is initiated by sim4 for 2 packets of Echo request.
There is no fluctuation in the neighbor table during this scenario and when the ECMP is converted to single route by shutting down sim5, ping is success 100% and there is no packet loss seen.
Please find below the packet captures in kernel for the ping packets for the problematic case:
[root@localhost LR]# tethereal -i sim4
Capturing on sim4
0.000000 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
0.999978 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
1.000020 14.0.0.1 -> 15.0.0.2 ICMP Echo (ping) reply
1.999970 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
2.999970 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
3.000006 14.0.0.1 -> 15.0.0.2 ICMP Echo (ping) reply
3.999966 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
4.999976 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
5.000018 14.0.0.1 -> 15.0.0.2 ICMP Echo (ping) reply
5.999972 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
6.999967 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
7.000000 14.0.0.1 -> 15.0.0.2 ICMP Echo (ping) reply
7.999970 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
8.999971 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
9.000004 14.0.0.1 -> 15.0.0.2 ICMP Echo (ping) reply
9.999969 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
10.999968 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
11.000002 14.0.0.1 -> 15.0.0.2 ICMP Echo (ping) reply
11.999971 15.0.0.2 -> 14.0.0.1 ICMP Echo (ping) request
[root@localhost LR]# ping 14.1
PING 14.1 (14.0.0.1) 56(84) bytes of data.
64 bytes from 14.0.0.1: icmp_seq=1 ttl=63 time=0.078 ms
64 bytes from 14.0.0.1: icmp_seq=3 ttl=63 time=0.054 ms
64 bytes from 14.0.0.1: icmp_seq=5 ttl=63 time=0.076 ms
64 bytes from 14.0.0.1: icmp_seq=7 ttl=63 time=0.060 ms
64 bytes from 14.0.0.1: icmp_seq=9 ttl=63 time=0.066 ms
64 bytes from 14.0.0.1: icmp_seq=11 ttl=63 time=0.058 ms
64 bytes from 14.0.0.1: icmp_seq=13 ttl=63 time=0.063 ms
64 bytes from 14.0.0.1: icmp_seq=15 ttl=63 time=0.052 ms
64 bytes from 14.0.0.1: icmp_seq=17 ttl=63 time=0.052 ms
64 bytes from 14.0.0.1: icmp_seq=19 ttl=63 time=0.054 ms
64 bytes from 14.0.0.1: icmp_seq=21 ttl=63 time=0.058 ms
64 bytes from 14.0.0.1: icmp_seq=23 ttl=63 time=0.040 ms
64 bytes from 14.0.0.1: icmp_seq=25 ttl=63 time=0.040 ms
64 bytes from 14.0.0.1: icmp_seq=27 ttl=63 time=0.038 ms
64 bytes from 14.0.0.1: icmp_seq=29 ttl=63 time=0.038 ms
64 bytes from 14.0.0.1: icmp_seq=31 ttl=63 time=0.036 ms
64 bytes from 14.0.0.1: icmp_seq=33 ttl=63 time=0.042 ms
64 bytes from 14.0.0.1: icmp_seq=35 ttl=63 time=0.038 ms
^C
--- 14.1 ping statistics ---
35 packets transmitted, 18 received, 48% packet loss, time 33999ms
rtt min/avg/max/mdev = 0.036/0.052/0.078/0.014 ms
The kernel version used is Fedora 18 Linux 3.10.0. There does not seem to be a packet drop in kernel when viewing the /var/log/messages file.
Can you please help in resolving this issue at the earliest?
Thanks.
With Regards,
N.Raghu Raman.
Please find below the Kernel commands to reproduce the issue:
This is a similar topology but there is a slight difference in the interface numbering used.
Wired Connectivity:
-------------------
cd /home/SIM_DEV_3.10/wired1
insmod wired1.ko if1=sim5 if2=sim6
cd ../wired2/
insmod wired2.ko if1=sim7 if2=sim8
cd ../wired3/
insmod wired3.ko if1=sim9 if2=sim10
cd ../wired4/
insmod wired4.ko if1=sim11 if2=sim12
cd ../wired5/
insmod wired5.ko if1=sim13 if2=sim14
cd ../wired6/
insmod wired6.ko if1=sim15 if2=sim16
ip netns add vrf1
ip netns add vrf2
ip netns add vrf3
ifconfig sim5 13.0.0.1/8 up
ip link set netns vrf1 dev sim6
ip netns exec vrf1 ifconfig sim6 13.0.0.2/8 up
ip link set netns vrf1 dev sim7
ip netns exec vrf1 ifconfig sim7 14.0.0.1/8 up
ip link set netns vrf2 dev sim8
ip netns exec vrf2 ifconfig sim8 14.0.0.2/8 up
ip link set netns vrf2 dev sim9
ip netns exec vrf2 ifconfig sim9 15.0.0.1/8 up
ip link set netns vrf3 dev sim10
ip netns exec vrf3 ifconfig sim10 15.0.0.2/8 up
ifconfig sim11 16.0.0.1/8 up
ip link set netns vrf1 dev sim12
ip netns exec vrf1 ifconfig sim12 16.0.0.2/8 up
ip link set netns vrf1 dev sim13
ip netns exec vrf1 ifconfig sim13 17.0.0.1/8 up
ip link set netns vrf2 dev sim14
ip netns exec vrf2 ifconfig sim14 17.0.0.2/8 up
ip link set netns vrf2 dev sim15
ip netns exec vrf2 ifconfig sim15 18.0.0.1/8 up
ip link set netns vrf3 dev sim16
ip netns exec vrf3 ifconfig sim16 18.0.0.2/8 up
#ECMP routes#
#vrf3
ip netns exec vrf3 ip route add 13.0.0.0/8 proto 20 metric 1 nexthop via 15.0.0.1 weight 1 nexthop via 18.0.0.1 weight 1
ip netns exec vrf3 ip route add 14.0.0.0/8 proto 20 metric 1 nexthop via 15.0.0.1 weight 1 nexthop via 18.0.0.1 weight 1
ip netns exec vrf3 ip route add 16.0.0.0/8 proto 20 metric 1 nexthop via 15.0.0.1 weight 1 nexthop via 18.0.0.1 weight 1
ip netns exec vrf3 ip route add 17.0.0.0/8 proto 20 metric 1 nexthop via 15.0.0.1 weight 1 nexthop via 18.0.0.1 weight 1
#vrf2
ip netns exec vrf2 ip route add 13.0.0.0/8 proto 20 metric 1 nexthop via 14.0.0.1 weight 1 nexthop via 17.0.0.1 weight 1
ip netns exec vrf2 ip route add 16.0.0.0/8 proto 20 metric 1 nexthop via 14.0.0.1 weight 1 nexthop via 17.0.0.1 weight 1
#vrf1
ip netns exec vrf1 ip route add 15.0.0.0/8 proto 20 metric 1 nexthop via 14.0.0.2 weight 1 nexthop via 17.0.0.2 weight 1
ip netns exec vrf1 ip route add 18.0.0.0/8 proto 20 metric 1 nexthop via 14.0.0.2 weight 1 nexthop via 17.0.0.2 weight 1
#default
ip route add 15.0.0.0/8 proto 20 metric 1 nexthop via 13.0.0.2 weight 1 nexthop via 16.0.0.2 weight 1
ip route add 14.0.0.0/8 proto 20 metric 1 nexthop via 13.0.0.2 weight 1 nexthop via 16.0.0.2 weight 1
ip route add 17.0.0.0/8 proto 20 metric 1 nexthop via 13.0.0.2 weight 1 nexthop via 16.0.0.2 weight 1
ip route add 18.0.0.0/8 proto 20 metric 1 nexthop via 13.0.0.2 weight 1 nexthop via 16.0.0.2 weight 1
From default namespace(not inside vrf1 vrf2 and vrf3 namespaces): ping below is only 50% successful:
[root@localhost raghu]# ping 15.0.0.2
PING 15.0.0.2 (15.0.0.2) 56(84) bytes of data.
64 bytes from 15.0.0.2: icmp_seq=1 ttl=62 time=0.066 ms
64 bytes from 15.0.0.2: icmp_seq=3 ttl=62 time=0.027 ms
64 bytes from 15.0.0.2: icmp_seq=5 ttl=62 time=0.026 ms
64 bytes from 15.0.0.2: icmp_seq=7 ttl=62 time=0.027 ms
^C
--- 15.0.0.2 ping statistics ---
8 packets transmitted, 4 received, 50% packet loss, time 6999ms
rtt min/avg/max/mdev = 0.026/0.036/0.066/0.018 ms
[root@localhost raghu]#
|