gigabit ethernet performance
I have recently upgraded my home network to gigE, expecting to get transfer speeds around 50mB/s on nfs. Unfortunately first results fall very short, only around 17mB/s. I have done a bit more testing, and here is what I came up with :
- during nfs transfers, CPU usage on the server is around 60% wait and 40% system, so it seems the network is the problem. CPU usage on the client is around 5%. The server is an IBM dual-p3/500 with 512mB ram. The client is a core 2 duo (3gHz) with 4gB ram. - disk access on the server can be done at around 40-60mB/s. I expect that the pci bus should handle 50mB/s disk read + 50mB/s gigE (the total is below the bus' 133mB/s limit) Code:
server: netcat -l -p 1234 > /dev/null Code:
client: netcat -l -p 1234 > /dev/null Code:
server: netcat -l -p 1234 > /dev/null Code:
client: netcat -l -p 1234 > /dev/null client: Code:
ethtool eth0 Code:
ethtool eth0 |
watch vmstat 1 while accessing the share... esp the IO part (could be of interest)
|
Quote:
Code:
dd if=bigfile bs=1M count=1000 of=/dev/null Code:
vmstat 1 |
I put gigabit in about a year ago and expected to get about 900 Mbps. In practice I can get about 300-350 Mbps sustained transfer rates. I did some tweaking but didn't really improve things much. I suspect the PCI bus is in fact a bottleneck - unless you only have one device on it and the data is going one way!
In the end because I was achieving 3 times the speed of 100BaseT and close to raw disk performance there wasn't much point in pursuing it further. |
Quote:
What kind of tweaking have you done to try to make yours faster? |
What happens when you use ftp? E.g. with an ftp-server like pure-ftpd?
|
My main server is still running with SuSE 10.0, the other machines were running openSuSE 10.2 I think.
First thing I did was to get the latest gig drivers for my cards (they were all RTL8169 chipset). The other was to look at the sysctl settings for: net.ipv4.tcp_wmem net.ipv4.tcp_rmem net.ipv4.tcp_mem eg as root: Code:
server:#sysctl net.ipv4.tcp_wmem Code:
net.ipv4.tcp_rmem 4096 87380 174760 Code:
net.ipv4.tcp_rmem 4096 87380 4194304 Code:
sysctl -w net.ipv4.tcp_wmem="4096 16384 1048576" Code:
net.ipv4.tcp_wmem = 4096 16384 3117056 Code:
iperf -c <server> -i 1 -t 10 Hope this helps. |
Thanks for the advice. I increased all the buffers to 4mB and the asymmetric performance issue seems gone. However, I get 500mb/s with iperf, which is good, but I only get 320mb/s with netcat and 136mb/s with nfs, which is exactly what I had before. (Yes, I remounted before testing.) I tried both nfs v3 and v4 over tcp. Nfs rsize and wsize are set to 64k.
|
Quote:
|
Looks like the usual benchmarking vs real world issue. As I said before once I got it stable given that raw disk performance (again using hdparm not real world!) limits data transfer to about 50 MB/sec I didn't see the point in looking further. Maybe wireshark could tell you more about the packet sizes and whether any TCP errors were slowing things down.
|
I seem to recall that Nfs is happier with udp rather than tcp. Something about collisions and resends.
|
Quote:
time dd if=/dev/zero of=some_file bs=10M count=100 <-- do this on the server, not over nfs. what are your MTU settings on both client and server (as can be seen via ifconfig)? what sort of switch are you using? do you see a difference in speed when fetching a file from the nfs server as opposed to writing one? |
The server has a 10G SCSI disk for the system, a 320G ide disk for /home and a 1T sata disk for file storage (mostly pictures and video).
The write performance (on the server) is 24mB/s for the ide disk, 53mB/s for sata and an abysmal 16mB/s for scsi (the disk is almost 10 years old). I have a spare ide controller somewhere, maybe that can improve ide performance. NFS write is 12mB/s for the ide disk and 17 mB/s for the sata disk. Read is 18 and 21. I tried with MTU of both 1500 and 9000 (jumbo frames) on both the client and the server (yes, the switches support it). The switches are unmanaged Netgear and D-link gigabit switches. I just found something that may be interesting: if I get a file over nfs, remount the export (to flush the client cache, but keep the server's) and get it again, the speed is around 60mB/s. So it does seem to be something about the disks on the server. |
I think that confirms it - if iperf is symmetrical and ~500 Mbps then the networking is going as fast as it can and probably limited by PCI bus. When I had asymmetric performance and some very low figures on one machine it was due to TCP errors and retries. Changing the net.ipv4 sysctl variables fixed that.
By the way when rerunning iperf yesterday and getting ~330 Mbps, on one machine I was only getting ~ 200Mbps. However, at the time there was a mysql process consuming about 60% cpu at the time so it also depends on CPU usage as well as PCI bus. I also had a passing look at jumbo frames but when experimenting with the sysctl variables and also looked into the theoretical performance gains that could be made. However, they were marginal at best and decided that it was irrelevant because of the PCI bus and disk access limitations. I think my newest server can manage ~ 600 Mbps but the clients can't keep up! The new server has a SATA drive but can only get ~ 75 MB/sec which isn't much better than IDE so maybe I'll have a look at that myself. |
Not the bus...
Code:
dd if=/dev/sda of=/dev/null bs=1M& Code:
dd if=/dev/hda of=/dev/null bs=1M& So the bottleneck is not on the bus. What I found interesting is that the disk performance goes down by half under heavy network usage. CPU during the combined iperf/dd runs is 70sys/0id/30wa. dd uses 45% cpu. So that's the bottleneck. Maybe I should first look at lowering cpu usage for disk access (is it possible? dma and acpi are already on.) But during an nfs operation, the cpu is 25sys/10id/65wa and I only get 18mB/s. Why is wait so high if neither device is at full speed and the cpu is not maxed? :confused: |
All times are GMT -5. The time now is 10:04 AM. |