LinuxQuestions.org - Testing bonding performance with scp

- Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)

- - Testing bonding performance with scp (https://www.linuxquestions.org/questions/linux-networking-3/testing-bonding-performance-with-scp-595540/)

Testing bonding performance with scp

I have two PCs. Each is running Fedora Core 5. Each PC has an Intel PRO/1000 PT dual port NIC. eth0 of PC#1 is directly connected to eth0 of PC#2, and eth1 of PC#1 is directly connected to eth1 of PC#2. I.e., there is no switch in the mix. eth0 and eth1 are bonded together on both PCs.

Bonding appears to be working, as the # of TX packets on eth0 + # of TX packets on eth1 = # of TX packets on bond0. Same for RX packets. They are pretty much evenly split between the two interfaces.

However, I'm trying to use scp and the time command to see if performance has improved.

bond0 on PC#1: 10.0.0.1
bond0 on PC#2: 10.0.0.2

The command I use to scp a bitmap file (between 10 and 50 MB) from PC#1 to PC#2 is:
time scp <filename> root@10.0.0.2:/home/images

I've tried balance-rr, with tcp_reordering at 3 and at 127.

The problem is that the time it takes to scp the file from one PC to the other is about the same when using bonding as it is when not using bonding, if not slightly slower).

Any idea what I am doing wrong? Please let me know if you need more information.

If your link is running faster than the drives can write, you will see this behavior. GigE by itself is faster than a lot of drives.

Thanks. Does that make sense given the following data?

Without bonding:

Image Bytes sec MB/s
-------------------------------
Image #1 47023158 1.46 30.65
Image #2 13304886 0.65 19.57
Image #3 13304886 0.62 20.33
Image #4 19039286 0.77 23.62
Image #5 20431926 0.81 24.11
Image #6 36709430 1.22 28.67

With bonding (balance-rr with tcp_reordering at 127):

Image Bytes sec MB/s
-------------------------------
Image #1 47023158 1.54 29.16
Image #2 13304886 0.66 19.23
Image #3 13304886 0.70 18.12
Image #4 19039286 0.84 21.51
Image #5 20431926 0.89 21.87
Image #6 36709430 1.29 27.21

The PCs are identical.
CPU: Pentium 4 3.00GHz
Memory: 2GB RAM
Disk: TOSHIBA MK4032GAX 40.0GB
NIC: Intel PRO/1000 PT dual port
OS: Fedora Core 5

How I can prove that the disks are the bottleneck? What other methods are there to verify that throughput has actually increased with bonding?

Also, I have encountered some links that say performance won't improve when you bond Gigabit Ethernet cards in Linux. Is that true?

Try:

[root@localhost ~]# hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 1186 MB in 2.00 seconds = 592.91 MB/sec
Timing buffered disk reads: 174 MB in 3.03 seconds = 57.42 MB/sec
[root@localhost ~]#

Where /dev/sda is whatever is appropriate for your drives.

If I remember correctly 100mbit is 12MB/s sec and GigE is 125MB/s so in my case GigE vastly outruns my hard drives. My older system uses Raid0 on WD 160gb drives (they where the "hot" drive when I bought them) and it is about 110MB/s. I would really be surprised if bonding GigE did not improve link speed IF one has the hard drive speed to back it up. As a guess I would say that one would need at least 200MB/s drives to see this advantage.

Good Luck
Lazlow

Thanks again, Lazlow.

hdparm -tT /dev/hda yielded 31.37 MB/s. This makes sense. I could get very close to this number but never meet or exceed it.

netperf/netserver yielded 117 MB/s without bonding (theoretical max is 1 Gbps or 128 MB/s), and it yielded 205 MB/s with bonding (theoretical max is 2Gbps or 256 MB/s). So I am indeed getting greater network performance with bonding. Note that it was very important for me to change tcp_reordering from 3 to 127 for balance-rr.

I wish I could get scp to work without disk access. I tried:

time scp <filename> root@10.0.0.2:/dev/null/

but hit an "scp: /dev/null/: Is a directory" error. Then, I tried:

time scp <filename> root@10.0.0.2:/dev/null

but hit an "scp: /dev/null: truncate: Invalid argument" error. scp does give me a MB/s measurement here, but it is similar to what I was seeing before.

Is there a way to scp to /dev/null? Or is it that the syntax of my second attempt is valid and that it is actually the read from the source computer's hard drive that is the bottleneck? Am I correct to assume that the file will be read into the source computer's memory cache after the first copy attempt, eliminating the disk latency for subsequent copies?

No, I do not know of a way. But what is the point? You know that moving any significant amount of data you do not have the HD speed to handle it and you know(from netperf) that for smaller data sets you have the speed. The only time that I can think of that greater speed (than GigE) would be any use, without the disk speed to back it, would be Vnc.

What's the point???

Quote:

Originally Posted by bcg121 (Post 2942680)

hi guys...i meet the same problem now..
2 pcs connect directly and each bonding 2 GigE together, when test with iperf/netperf, the speed is even lower than a single card, only about 600mb/s.
Try to modify the tcp_reordering to 127 but still the same problem.
bcg121, seems you had successfully increase the speed by bonding..can u tell me what's the point and how u make it works?

thanks in advance.

I ended up writing two socket applications in C, one to send a file (after reading the entire file contents into memory) and one to receive the file. I did all the timing in the receive application. This eliminated any overhead that I was encountering by using standard Linux file transfer utilities like NFS and SCP.

Without bonding, I averaged 101MB/s. With bonding, I averaged 205MB/s. The results were very consistent.

Bond setup with redundancy as primary target

Dear LQ friends

can you kindly suggest me the best choice for bonding two gigabit nics, but mainly for redundancy, next, if any of the modes will fit redundancy + performances... welcome.

Additionally a little of explanation on how to use netperf and the recommended setting to apply to tcp_reordering for such bond configuration, would be welcome as well.

Please note that my OS is XenServer by Citrix that in the underground appears to be CentOS (at least at the repositories level)

Thank you for any tip.

Robert

.

A gigabit nic is a gigabit nic. Multiple gigabit nic tests REALLY need multiple hitters. If you want to test your bonded config effectively, you need to have multiple clients. Then you can get a better feel for the performance differences. You simply won't see it with just one client going to one server. There's no magic that's going to happen. No miracle.