Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I try to setup a locale network between 10 (Web) Servers (openSuse 11.2) , each Server is connected to the internet (eth0) which works fine on all servers.
A 2nd NIC eth1 (1GBit rtl-8169) on each Server is connect to a Switch and should function as a LAN. I installed/configured the 2nd NIC with yast, and than added a route for the local network (192.168.20.0) to use eth1. So far every thing works (ssh for example), but I have a packet loss of 10%-60% (ping) on the local network, and I cant find the reason for the packet loss. I already installed a Debian Lenny on 2 Servers (just to test) but I have the same problem on Debian.
No firewall or any other application is in the way. With tcpdump I could figure out that the packages are send but never show up on the destination server.
I put some more information about how I configured the LAN below. I have not done this my first time and from my experience if something is wrong with the network configuration (wrong routing, firewall in the way, etc.) this usually leads to a packet loss of 100% or the destination is simply not reachable. Hopfully someone has a idea what I did wrong, thanks in advance.
The 2nd NIC is installed with either yast on suse , or by editing /etc/network/interfaces on debian. The Kernel module rtl8169 is loaded.
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
x.x.x.0 static.1.126.40 255.255.255.192 UG 0 0 0 eth0
x.x.x.0 * 255.255.255.192 U 0 0 0 eth0
192.168.20.0 * 255.255.255.0 U 0 0 0 eth1
192.168.20.0 * 255.255.255.0 U 0 0 0 eth1
default static.1.126.4 0 0.0.0.0 UG 0 0 0 eth0
Output of ping:
Debian-50-lenny-64-minimal:~# ping 192.168.20.9
PING 192.168.20.9 (192.168.20.9) 56(84) bytes of data.
64 bytes from 192.168.20.9: icmp_seq=1 ttl=64 time=0.153 ms
64 bytes from 192.168.20.9: icmp_seq=2 ttl=64 time=0.182 ms
64 bytes from 192.168.20.9: icmp_seq=3 ttl=64 time=0.127 ms
64 bytes from 192.168.20.9: icmp_seq=7 ttl=64 time=0.131 ms
64 bytes from 192.168.20.9: icmp_seq=8 ttl=64 time=0.179 ms
--- 192.168.20.9 ping statistics ---
9 packets transmitted, 5 received, 44% packet loss, time 7999ms
rtt min/avg/max/mdev = 0.127/0.154/0.182/0.025 ms
OK, well two places to start. 1) Is the ICMP echo-request itself lost, or is the packet hitting the machine at the other end and the echo-reply never makes it back? Is this consistent? 2) Are your switch interfaces configured correctly? What is the switch? If you're only using a 10/100 switch, then you could well have issues with ethernet negotiation. ethtool should show you what speeds the interfaces are running at. Be watchful for any running at 100/half for example.
the ICMP request is lost, it never reaches the target. I have put he output of tcpdump below.
The Hardware was installed by our Provider (2nd NIC’s switch kables), i don’t have physical access to the hardware to try out different Hardware (switch/kable). But the switch is supposed to be a 1Gbit switch.
Even so I noticed that tcpdump tells me that eth1 is a 10MB link
I’m not sure about the output of ethtool (have not worked with it yet) I have run it like %> ethtool eth1 but I am not sure if it’s the current modes it uses or only the possible supported modes. I put a the output below as well.
I pinged 192.168.20.10 from 192.168.20.9 and tcpdumed both:
Ping (from 192.168.20.9):
Debian-50-lenny-64-minimal:~# ping 192.168.20.10
PING 192.168.20.10 (192.168.20.10) 56(84) bytes of data.
64 bytes from 192.168.20.10: icmp_seq=3 ttl=64 time=0.142 ms
64 bytes from 192.168.20.10: icmp_seq=5 ttl=64 time=0.159 ms
64 bytes from 192.168.20.10: icmp_seq=7 ttl=64 time=0.159 ms
64 bytes from 192.168.20.10: icmp_seq=11 ttl=64 time=0.152 ms
64 bytes from 192.168.20.10: icmp_seq=12 ttl=64 time=0.227 ms
--- 192.168.20.10 ping statistics ---
14 packets transmitted, 5 received, 64% packet loss, time 13028ms
rtt min/avg/max/mdev = 0.142/0.167/0.227/0.034 ms
output of tcpdump on 192.168.20.9 :
tcpdump -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
15:51:49.799977 IP 192.168.20.12.ntp > 192.168.20.11.ntp: NTPv4, Client, length 48
15:51:49.802374 arp who-has 192.168.20.12 tell 192.168.20.11
15:52:06.173328 arp who-has 192.168.20.10 tell 192.168.20.9
15:52:06.173517 arp reply 192.168.20.10 is-at 00:e0:52:19:03:08 (oui Unknown)
15:52:06.173525 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 1, length 64
15:52:07.183549 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 2, length 64
15:52:08.183600 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 3, length 64
15:52:08.183737 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 3, length 64
15:52:09.183511 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 4, length 64
15:52:10.183600 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 5, length 64
15:52:10.183754 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 5, length 64
15:52:11.183591 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 6, length 64
15:52:12.183600 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 7, length 64
15:52:12.183754 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 7, length 64
15:52:13.181365 arp who-has 192.168.20.9 tell 192.168.20.10
15:52:13.181372 arp reply 192.168.20.9 is-at 00:e0:52:76:51:80 (oui Unknown)
15:52:13.184808 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 8, length 64
15:52:14.181387 arp who-has 192.168.20.9 tell 192.168.20.10
15:52:14.181396 arp reply 192.168.20.9 is-at 00:e0:52:76:51:80 (oui Unknown)
15:52:14.184849 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 9, length 64
15:52:15.181359 arp who-has 192.168.20.9 tell 192.168.20.10
15:52:15.181364 arp reply 192.168.20.9 is-at 00:e0:52:76:51:80 (oui Unknown)
15:52:15.184838 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 10, length 64
15:52:16.199591 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 11, length 64
15:52:16.199737 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 11, length 64
15:52:17.199576 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 12, length 64
15:52:17.199797 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 12, length 64
15:52:18.199603 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 13, length 64
15:52:19.199588 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 14, length 64
29 packets captured
29 packets received by filter
0 packets dropped by kernel
output of tcpdump on 192.168.20.10 :
tcpdump -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
15:51:49.635376 IP 192.168.20.12.ntp > 192.168.20.11.ntp: NTPv4, Client, length 48
15:51:49.637776 arp who-has 192.168.20.12 tell 192.168.20.11
15:52:06.008644 arp who-has 192.168.20.10 tell 192.168.20.9
15:52:06.008661 arp reply 192.168.20.10 is-at 00:e0:52:19:03:08 (oui Unknown)
15:52:08.018860 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 3, length 64
15:52:08.018878 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 3, length 64
15:52:10.018869 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 5, length 64
15:52:10.018879 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 5, length 64
15:52:12.018846 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 7, length 64
15:52:12.018856 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 7, length 64
15:52:13.016394 arp who-has 192.168.20.9 tell 192.168.20.10
15:52:14.016392 arp who-has 192.168.20.9 tell 192.168.20.10
15:52:15.016394 arp who-has 192.168.20.9 tell 192.168.20.10
15:52:15.016475 arp reply 192.168.20.9 is-at 00:e0:52:76:51:80 (oui Unknown)
15:52:16.034769 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 11, length 64
15:52:16.034778 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 11, length 64
15:52:17.034764 IP 192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 12, length 64
15:52:17.034773 IP 192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 12, length 64
18 packets captured
18 packets received by filter
0 packets dropped by kernel
output of ethtool (same on .9 .10):
Debian-50-lenny-64-minimal:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000033 (51)
Link detected: yes
OK, so it seems during the drop, nothing is entering the .10 server. It's arping for the source server IP, but never hears back the response which is apparently successfully sent. Is this a single switch? There could be some trunking issues or something like that in between multiple switches. Nothing else sexy happening with the physical connectivity you've not mentioned? The NIC's seem to have negotiated gigabit just fine. Not really sure what else to suggest without being able to access the switch and see if, for example, the arp replies are seen within the switch itself, rather than the .9 machine just telling you it replied. Useful to prove it did outside of it's control.
Are there other machines on the system? Can you ping from multiple sources concurrently? Failing that running some tcp services between the boxes may also be pretty handy, e.g. a netcat link to just pass data in both directions, and see what happens during the ICMP loss then. I could also be tempted to put a static arp entry into each server and see what the ICMP traffic looks like then, when arp traffic is not required.
why? That may well cause even more problems if the switch side port isn't set the same. It might change something, sure, so could be worth doing for a test, but it's certainly not a solution.
why? That may well cause even more problems if the switch side port isn't set the same. It might change something, sure, so could be worth doing for a test, but it's certainly not a solution.
I am not talk about solution. I am trying to find out what caused 64% packets loss.
To OP. If you look to your output of tcpdump you will see:
192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 1, length 64
192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 2, length 64
192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 3, length 64
But on other side:
192.168.20.9 > 192.168.20.10: ICMP echo request, id 15632, seq 3, length 64
192.168.20.10 > 192.168.20.9: ICMP echo reply, id 15632, seq 3, length 64
First 2 sequences did not even come to destination.
Problem could be with switch (overloaded), speed (1000), ethernet cables.
I tried it with static arp entries on 192.168.20.9/10 same result, just the broadcasts don’t show up anymore (as it should).
What I have noticed is that some connections seam to be more stable than others,
For example a ping between 192.168.20.7 and .10 has a packet loss of 0% - 11% in both directions, while a ping between .7 and .9 has a loss of 50% - 60% in both directions, and this values seem to stay stable. (Can this possibly be a configuration problem ? )
I can’t say anything about the switch itself, should be 1Gbit Hardware switch, there should be our 10 servers only connected to the switch and nothing else.
Well we ordered the 10 Servers and additional Hardware (1 Gbit Ethernet cards for the servers + a 1 GBit Switch ) . The Hardware installation was completely done by the Provider so I cant say for sure that they have used a 10 port (or higher) switch or just plugged together 2 smaller ones (I don’t think they did cause we have to pay for the slot the switch is in) and from our experience they use quality hardware.
I know that they have build a new data centre, cause they run short on space, maybe they haven’t put the switch into a slot , and now its not properly grounded , or something like that, is causing interferences on the cables directly (but that is just guessing).
I have to finish for today, and thanks for your help acid_kewpie.
Well I try setting it down to 100, one last try after 12 hours.
I configured the network cards on .9 .10 to use speed 100 duplex full.
%> ethtool –s eth1 speed 100 duplex full
(ethtool) wouldn’t set the speed to 100 so I used mii-tool
%> mii-tool 100baseTx-FD eth1
which set the speed down to 100 , and now the pings between .9 and .10 don’t have packet losses any more (having 100 MBit network wouldn’t be perfect, but still usable ).
I tried the same on the other Servers running Suse (which still have packet losses), my problem is now that ethtool wont set the speed down, while mii-tool (doesn’t work ) tells me
%> mii-tool -F 100baseTx-FD eth1
SIOCGMIIPHY on 'eth1' failed: Operation not supported.
P.S.: I dont have access to the hardware (no cable exchange etc. possible).
I still couldn’t make it setting the speed down to 100 with ethtool ,
but if I set the port to tp (twisted pair) I get ride of the packet losses.
But with scp for example. the connection still stales (doesn’t do that on the servers which have been set to 100baseTx).
Thanks for your help, I think I will contact the Provider on Monday.
If port supports TP, try to change to TP and 1000BaseT, may be it helps.
But it is only setting for NIC port.
Who knows what kind of cables do they use, what is quality of them.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.