Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I have two identical RHEL nodes kernel 2.6.9-78. ELsmp connected with gigabit ethernet. ethtool shows that they are set to 1000Mbps and their LEDs show the same thing. I have installed the latest driver for the network cards and used a CAT 6 cable. but when I run my program as a lam ( with both rsh and ssh) or just copy small files over the mounted folder my speed is like 5Mbps. however when I copy larger files (I tried 25MB) the speed is ok ( 25MBps at least ).
I changed my mounting from UDP to tcp because of a post that was suggesting it might be a UDP fragmentation and also changed my packet size to 8192 but didn't help.
can it be a buffer size problem in the network card? I changed the network card and the speed became twice but still low.
my lam code is just sending one integer and receiving it back btw.
netstat -s shows:
12759324 total packets received
1 with invalid addresses
0 incoming packets discarded
12759320 incoming packets delivered
12796734 requests sent out
4 reassemblies required
1 packets reassembled ok
10 fragments received ok
18 ICMP messages received
6 input ICMP message failed.
ICMP input histogram:
destination unreachable: 18
20 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 20
564 active connections openings
40 passive connection openings
6 failed connection attempts
56 connection resets received
9 connections established
12757902 segments received
12795974 segments send out
141 segments retransmited
0 bad segments received.
44 resets sent
738 packets received
2 packets to unknown port received.
0 packet receive errors
740 packets sent
122 TCP sockets finished time wait in fast timer
1 packets rejects in established connections because of timestamp
919 delayed acks sent
12704023 packets directly queued to recvmsg prequeue.
49940 packets directly received from prequeue
12715744 packets header predicted
72 packets header predicted and directly queued to user
14486 acknowledgments not containing data received
12724061 predicted acknowledgments
10 congestion windows recovered after partial ack
0 TCP data loss events
35 other TCP timeouts
10 DSACKs received
14 connections reset due to unexpected data
28 connections reset due to early user close
2 connections aborted due to timeout
and tcpdump shows:
18:36:21.361668 IP node2.32782 > mainnode.33339: P 2348809:2348837(28) ack 4026576 win 1460 <nop,nop,timestamp 310851 11850548>
18:36:21.361690 IP mainnode.33339 > node2.32782: P 4026576:4026624(48) ack 2348837 win 1448 <nop,nop,timestamp 11850548 310851>
18:36:21.361820 IP node2.32782 > mainnode.33339: P 2348837:2348865(28) ack 4026624 win 1460 <nop,nop,timestamp 310851 11850548>
18:36:21.361844 IP mainnode.33339 > node2.32782: P 4026624:4026672(48) ack 2348865 win 1448 <nop,nop,timestamp 11850548 310851>
18:36:21.361969 IP node2.32782 > mainnode.33339: P 2348865:2348893(28) ack 4026672 win 1460 <nop,nop,timestamp 310851 11850548>
18:36:21.361994 IP mainnode.33339 > node2.32782: P 4026672:4026720(48) ack 2348893 win 1448 <nop,nop,timestamp 11850548 310851>
18:36:21.362119 IP node2.32782 > mainnode.33339: P 2348893:2348921(28) ack 4026720 win 1460 <nop,nop,timestamp 310851 11850548>
18:36:21.362142 IP mainnode.33339 > node2.32782: P 4026720:4026768(48) ack 2348921 win 1448 <nop,nop,timestamp 11850548 310851>
18:36:21.362268 IP node2.32782 > mainnode.33339: P 2348921:2348949(28) ack 4026768 win 1460 <nop,nop,timestamp 310851 11850548>
18:36:21.362292 IP mainnode.3333
6980 packets captured
176061 packets received by filter
168954 packets dropped by kernel
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
Advertised auto-negotiation: Yes
Port: Twisted Pair
Supports Wake-on: umbg
Current message level: 0x00000007 (7)
Link detected: yes
this problem didn't let me sleep for the last few days. any help or idea is appreciated.
How are you measuring transfer speed, and what size are the files that you're transferring? Transfer speed will essentially be in bits/second. Gigabit ethernet is on the order of 10e10 bits/second. If you're sending a 1KB file down the line, your transfer is going to be done in something on the order of 10e6 seconds, and you have to start wondering what the resolution of your clock is.
Let me put it to you this way. Is this an actual performance issue (transferring lots of small files is actually causing noticeable delay), or are you just wondering why your numbers aren't what you expected?
thanks for the reply bartonski, yeah there is actually a performance issue that my mpi software is running pretty slow.
I decided to write my own code on lam and noticed the same bandwidth issue (cpus are running with 10% and ram has a lot of free space). of course when i change the code to do some more calculation on each node before the sending the data back cpus go up to 100% and bandwidth is less used. I am measuring the speed of copying with scp and my code and the software speed with iftop.
I installed the wireshark after dealing with all its dependencies but everything looks normal in the packet level.
There is not much retransmission and buffers are not full either. the only thing that I can see is that there is a lot of delay between the packets being sent which is probably why it is slow. but my cpus are only running with 10% and ram is free too.
this is what wireshark shows:
Frame 2 (114 bytes on wire, 114 bytes captured)
Arrival Time: Dec 23, 2009 23:48:06.919341000
Time delta from previous captured frame: 0.000024000 seconds
Time delta from previous displayed frame: 0.000024000 seconds
Time since reference or first frame: 0.000024000 seconds
Frame Number: 2
Frame Length: 114 bytes
Capture Length: 114 bytes
Frame is marked: False
Protocols in frame: eth:ip:tcp:data
Coloring Rule Name: TCP
Coloring Rule String: tcp
Ethernet II, Src: IntelCor_2a:80:5d (00:1b:21:2a:80:5d), Dst: IntelCor_2a:80:d7 (00:1b:21:2a:80:d7)
Internet Protocol, Src: 192.168.0.1 (192.168.0.1), Dst: 192.168.0.5 (192.168.0.5)
.1.. = Don't fragment: Set
Time to live: 64
Transmission Control Protocol, Src Port: 33096 (33096), Dst Port: 32790 (32790), Seq: 1, Ack: 29, Len: 48
Flags: 0x18 (PSH, ACK)
Window size: 1448
This is an ACK to the segment in frame: 1
Data (48 bytes)
and the ack of it:
Frame 3 (94 bytes on wire, 94 bytes captured)
Arrival Time: Dec 23, 2009 23:48:06.919419000
Time delta from previous captured frame: 0.000078000 seconds
Ethernet II, Src: IntelCor_2a:80:d7 (00:1b:21:2a:80:d7), Dst: IntelCor_2a:80:5d (00:1b:21:2a:80:5d)
Internet Protocol, Src: 192.168.0.5 (192.168.0.5), Dst: 192.168.0.1 (192.168.0.1)
Transmission Control Protocol, Src Port: 32790 (32790), Dst Port: 33096 (33096), Seq: 29, Ack: 49, Len: 28
Flags: 0x18 (PSH, ACK)
Window size: 1460
This is an ACK to the segment in frame: 2
Data (28 bytes)
Thanks, I changed the settings and the troughput increased but just a bit.
I used iperf and figured out that the gigabit ethernet is working fine (980 Mbps) for larger packets (MTU= 8900) however, changing the packet size (MTU) to 400 bytes reduces the throughput to 100Mbps and further reducing the size to 100bytes will result in 30Mbps throughput.
are these values expected? aren't they too low? does anybody get any better results for small packets?!! because I can't change my software to send larger packets.
Your post prompted me to do some research on MTU. As I understand it, the lower the MTU, the more overhead there is in switching and routing packets (Constant overhead/packet, more packets = more overhead). The default MTU of 1500 bytes is considered too small for Gigabit ethernet... as a matter of fact, 9600 bit 'jumbograms' are considered to be on the small side.
I understand that you are doing message passing, and I assume that you would like to be able to send 100 byte messages... is there any way that you can buffer the messages so that you send more than one at a time? I realize that there are probably instances where both sides of the connection are sending and replying to 100 byte messages, but I would guess that with some amount of cleverness, you could cut some of this out.