Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a following configuration: 4 PCs (say, A, B, C and D), running Ubuntu or Debian, interconnected using a gigabit switch, which is connected to the Internet. Two machines (say, A and B) also have a direct private connection between them (provided by another pair of NICs).
Now, when I test the connection performance with iperf, the results vary. The private connection between A and B performs well - about 930Mbps using iperf's UDP test. Between C and D it is about 800Mbps which I find tolerable. Packet loss when running these tests is negligible. However, when I run iperf between any of {A,B} and {C,D}, the performance significantly drops as there is a huge number of lost packets. For example, here is the result of testing between A and C:
[ 3] local xxx.xxx.xxx.xxx port 34702 connected with xxx.xxx.xxx.xxx port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 834 MBytes 700 Mbits/sec
[ 3] Sent 594940 datagrams
[ 3] Server Report:
[ 3] 0.0-10.2 sec 179 MBytes 147 Mbits/sec 12.645 ms 467089/594938 (79%)
[ 3] 0.0-10.2 sec 1 datagrams received out-of-order
Why is there such a large number of packets which are generated, but lost somewhere?
A<->B private link works fine, so system level parameters on both A and B are correct. Furthermore, C<->D works ok, so I guess I shouldn't blame the switch.
Is there a per-NIC configuration that I should check or it smells like a hw problem? Problematic NICs on both A and B are of the same type - Allied Telesyn AT2916T.
First blush says configuration issue (most likely duplex). You don't say how your systems are configured Are they configured to auto negotiate or to a specific configuration? What is the wire distance?
Auto negotiation is great when it works and a nightmare when it doesn't.
Another possibility is buffer overflow (the NIC buffer fills before the system empties the buffer).
If you are still having trouble post back with configs, and make sure not to use the same xxx ip address.
Be more accurate with ips, something like xxx.xxx.xxx.aaa, xxx.xxx.xxx.bbb, xxx.xxx.yyy.ccc, xxx.xxx.yyy.ddd so we can tell each system apart.
Thanks for your suggestions! Now I am a bit closer to the source of the trouble. Namely, I forgot to mention that A and B are not running an ordinary kernel: they are both Xen Dom0-s. When I reboot them with the same kernel, but without Xen hypervisor, the huge packet loss disappears. The performance is not great though, I am getting like 575 Mbps uplink and 690 Mbps downlink (everything is configured by auto-negotiation, I don't specify anything explicitly). Still, this bandwidth is perfectly fine for me, I just want to get rid of the packet loss problem.
Furthermore, I have discovered that the problem occurs only when A or B act as receivers. Here is the score (A=iperf server=receiver, C=iperf client=sender):
Quote:
[ 3] local xxx.xxx.xxx.aaa port 5001 connected with xxx.xxx.xxx.bbb port 38590
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-10.2 sec 324 MBytes 266 Mbits/sec 12.213 ms 356145/587534 (61%)
[ 3] 0.0-10.2 sec 1 datagrams received out-of-order
ifconfig reports no problems, but here is the output of netstat -su:
Quote:
IcmpMsg:
InType3: 19
OutType3: 27
OutType8: 14
Udp:
1181505 packets received
108486 packets to unknown port received.
2348066 packet receive errors
1481319 packets sent
RcvbufErrors: 2348066
UdpLite:
IpExt:
InMcastPkts: 17
InBcastPkts: 1926
InOctets: 1154882666
OutOctets: -2076309331
InMcastOctets: 476
InBcastOctets: 386075
(this is after serveral tests. After each one, not surprisingly, RcvbufErrors increases by the same number as the number of lost packets reported by iperf).
Any other suggestions? How to determine precisely where the packets get dropped? Judging from all this, it is Xen's fault, so I'll try to explore their mailing lists...
Have you tried testing this from a virtual machine and not dom0?
Or higher the specs of your Dom0 (especially RAM) for testing purpose.
The result you receive come close to transfer rates of single hard drives (Dom0 using swap for the received or send data???).
Thanks for your suggestions! Now I am a bit closer to the source of the trouble. Namely, I forgot to mention that A and B are not running an ordinary kernel: they are both Xen Dom0-s. When I reboot them with the same kernel, but without Xen hypervisor, the huge packet loss disappears. The performance is not great though, I am getting like 575 Mbps uplink and 690 Mbps downlink (everything is configured by auto-negotiation, I don't specify anything explicitly). Still, this bandwidth is perfectly fine for me, I just want to get rid of the packet loss problem.
Furthermore, I have discovered that the problem occurs only when A or B act as receivers. Here is the score (A=iperf server=receiver, C=iperf client=sender):
ifconfig reports no problems, but here is the output of netstat -su:
(this is after serveral tests. After each one, not surprisingly, RcvbufErrors increases by the same number as the number of lost packets reported by iperf).
Any other suggestions? How to determine precisely where the packets get dropped? Judging from all this, it is Xen's fault, so I'll try to explore their mailing lists...
It took me a while to figure out why on earth does not work the bonding.
When I rebooted my server I just realized it uses the Xen kernel instead of the regular one.
I changed it and after that everything worked well. Before that I lost every second ping packets!
So the Xen kernel could cause many network issue(es).
Well, I have discovered the bottleneck. Actually, the problem is the CPU. Under a "normal" kernel, processing of iperf burns 100% cycles of one, and about 50% of the other core. In the configuration with Xen, originally I had only one core dedicated to Dom0... by far insufficient for this kind of processing! Even with both the cores activated, all the cycles get consumed (because networking requires more "thinking" under Xen) and the problem persists.
Now I guess that the solution is to buy a faster processor
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.