gigabit ethernet slow
Hi every1,
I have two identical RHEL nodes kernel 2.6.9-78. ELsmp connected with gigabit ethernet. ethtool shows that they are set to 1000Mbps and their LEDs show the same thing. I have installed the latest driver for the network cards and used a CAT 6 cable. but when I run my program as a lam ( with both rsh and ssh) or just copy small files over the mounted folder my speed is like 5Mbps. however when I copy larger files (I tried 25MB) the speed is ok ( 25MBps at least ). I changed my mounting from UDP to tcp because of a post that was suggesting it might be a UDP fragmentation and also changed my packet size to 8192 but didn't help. can it be a buffer size problem in the network card? I changed the network card and the speed became twice but still low. my lam code is just sending one integer and receiving it back btw. netstat -s shows: Ip: 12759324 total packets received 1 with invalid addresses 0 forwarded 0 incoming packets discarded 12759320 incoming packets delivered 12796734 requests sent out 4 reassemblies required 1 packets reassembled ok 10 fragments received ok Icmp: 18 ICMP messages received 6 input ICMP message failed. ICMP input histogram: destination unreachable: 18 20 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 20 Tcp: 564 active connections openings 40 passive connection openings 6 failed connection attempts 56 connection resets received 9 connections established 12757902 segments received 12795974 segments send out 141 segments retransmited 0 bad segments received. 44 resets sent Udp: 738 packets received 2 packets to unknown port received. 0 packet receive errors 740 packets sent TcpExt: 122 TCP sockets finished time wait in fast timer 1 packets rejects in established connections because of timestamp 919 delayed acks sent 12704023 packets directly queued to recvmsg prequeue. 49940 packets directly received from prequeue 12715744 packets header predicted 72 packets header predicted and directly queued to user 14486 acknowledgments not containing data received 12724061 predicted acknowledgments 10 congestion windows recovered after partial ack 0 TCP data loss events 35 other TCP timeouts 10 DSACKs received 14 connections reset due to unexpected data 28 connections reset due to early user close 2 connections aborted due to timeout and tcpdump shows: 18:36:21.361668 IP node2.32782 > mainnode.33339: P 2348809:2348837(28) ack 4026576 win 1460 <nop,nop,timestamp 310851 11850548> 18:36:21.361690 IP mainnode.33339 > node2.32782: P 4026576:4026624(48) ack 2348837 win 1448 <nop,nop,timestamp 11850548 310851> 18:36:21.361820 IP node2.32782 > mainnode.33339: P 2348837:2348865(28) ack 4026624 win 1460 <nop,nop,timestamp 310851 11850548> 18:36:21.361844 IP mainnode.33339 > node2.32782: P 4026624:4026672(48) ack 2348865 win 1448 <nop,nop,timestamp 11850548 310851> 18:36:21.361969 IP node2.32782 > mainnode.33339: P 2348865:2348893(28) ack 4026672 win 1460 <nop,nop,timestamp 310851 11850548> 18:36:21.361994 IP mainnode.33339 > node2.32782: P 4026672:4026720(48) ack 2348893 win 1448 <nop,nop,timestamp 11850548 310851> 18:36:21.362119 IP node2.32782 > mainnode.33339: P 2348893:2348921(28) ack 4026720 win 1460 <nop,nop,timestamp 310851 11850548> 18:36:21.362142 IP mainnode.33339 > node2.32782: P 4026720:4026768(48) ack 2348921 win 1448 <nop,nop,timestamp 11850548 310851> 18:36:21.362268 IP node2.32782 > mainnode.33339: P 2348921:2348949(28) ack 4026768 win 1460 <nop,nop,timestamp 310851 11850548> 18:36:21.362292 IP mainnode.3333 6980 packets captured 176061 packets received by filter 168954 packets dropped by kernel and ethtool: ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: d Current message level: 0x00000007 (7) Link detected: yes this problem didn't let me sleep for the last few days. any help or idea is appreciated. Thanks in advance. |
How are you measuring transfer speed, and what size are the files that you're transferring? Transfer speed will essentially be in bits/second. Gigabit ethernet is on the order of 10e10 bits/second. If you're sending a 1KB file down the line, your transfer is going to be done in something on the order of 10e6 seconds, and you have to start wondering what the resolution of your clock is.
Let me put it to you this way. Is this an actual performance issue (transferring lots of small files is actually causing noticeable delay), or are you just wondering why your numbers aren't what you expected? |
thanks for the reply bartonski, yeah there is actually a performance issue that my mpi software is running pretty slow.
I decided to write my own code on lam and noticed the same bandwidth issue (cpus are running with 10% and ram has a lot of free space). of course when i change the code to do some more calculation on each node before the sending the data back cpus go up to 100% and bandwidth is less used. I am measuring the speed of copying with scp and my code and the software speed with iftop. |
I'm going to step aside gracefully at this point; you're doing stuff that's out of my league; any advice that I have to give now would be just as likely to be bad advice as good.
Having said that, I'd be breaking out wireshark about now, just to be sure that I know as much as possible about what's going up and down the wire; more info can't be bad in a situation like this. |
I installed the wireshark after dealing with all its dependencies but everything looks normal in the packet level.
There is not much retransmission and buffers are not full either. the only thing that I can see is that there is a lot of delay between the packets being sent which is probably why it is slow. but my cpus are only running with 10% and ram is free too. this is what wireshark shows: Frame 2 (114 bytes on wire, 114 bytes captured) Arrival Time: Dec 23, 2009 23:48:06.919341000 Time delta from previous captured frame: 0.000024000 seconds Time delta from previous displayed frame: 0.000024000 seconds Time since reference or first frame: 0.000024000 seconds Frame Number: 2 Frame Length: 114 bytes Capture Length: 114 bytes Frame is marked: False Protocols in frame: eth:ip:tcp:data Coloring Rule Name: TCP Coloring Rule String: tcp Ethernet II, Src: IntelCor_2a:80:5d (00:1b:21:2a:80:5d), Dst: IntelCor_2a:80:d7 (00:1b:21:2a:80:d7) Internet Protocol, Src: 192.168.0.1 (192.168.0.1), Dst: 192.168.0.5 (192.168.0.5) .1.. = Don't fragment: Set Time to live: 64 Transmission Control Protocol, Src Port: 33096 (33096), Dst Port: 32790 (32790), Seq: 1, Ack: 29, Len: 48 Flags: 0x18 (PSH, ACK) Window size: 1448 This is an ACK to the segment in frame: 1 Data (48 bytes) Data: 18000000C90000000000000000000000000000000E167C00... and the ack of it: Frame 3 (94 bytes on wire, 94 bytes captured) Arrival Time: Dec 23, 2009 23:48:06.919419000 Time delta from previous captured frame: 0.000078000 seconds Ethernet II, Src: IntelCor_2a:80:d7 (00:1b:21:2a:80:d7), Dst: IntelCor_2a:80:5d (00:1b:21:2a:80:5d) Internet Protocol, Src: 192.168.0.5 (192.168.0.5), Dst: 192.168.0.1 (192.168.0.1) Transmission Control Protocol, Src Port: 32790 (32790), Dst Port: 33096 (33096), Seq: 29, Ack: 49, Len: 28 Flags: 0x18 (PSH, ACK) Window size: 1460 This is an ACK to the segment in frame: 2 Data (28 bytes) Data: 04000000C90000000000000001000000000000000E167C00... anybody any ideas? |
Any chance the bottleneck is file system IO rather than the network link?
|
I am thinking that its the NIC itself because when I change it to another 2 gigabit card speed goes up from 5Mbps to 7Mbps and then again when I put a different brand gigabit it goes down to 3Mbps.
also changing the speed to 100 Mbps manually reduces the speed I get to 3Mbps and setting it to 10mbps completely stops the connection like there is no link at all. If it is anything other than the card itself why changing the speed should have these effects?!! is there any other setting in the network card other than ethtool that controls or have some effects on the speed? |
Quote:
this: http://en.wikipedia.org/wiki/Ethernet_flow_control and this: http://www.smallnetbuilder.com/index...ge=0&Itemid=54. |
Thanks, I changed the settings and the troughput increased but just a bit.
I used iperf and figured out that the gigabit ethernet is working fine (980 Mbps) for larger packets (MTU= 8900) however, changing the packet size (MTU) to 400 bytes reduces the throughput to 100Mbps and further reducing the size to 100bytes will result in 30Mbps throughput. are these values expected? aren't they too low? does anybody get any better results for small packets?!! because I can't change my software to send larger packets. Thanks |
Your post prompted me to do some research on MTU. As I understand it, the lower the MTU, the more overhead there is in switching and routing packets (Constant overhead/packet, more packets = more overhead). The default MTU of 1500 bytes is considered too small for Gigabit ethernet... as a matter of fact, 9600 bit 'jumbograms' are considered to be on the small side.
I understand that you are doing message passing, and I assume that you would like to be able to send 100 byte messages... is there any way that you can buffer the messages so that you send more than one at a time? I realize that there are probably instances where both sides of the connection are sending and replying to 100 byte messages, but I would guess that with some amount of cleverness, you could cut some of this out. |
All times are GMT -5. The time now is 04:55 AM. |