LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
Search this Thread
Old 11-15-2008, 09:09 AM   #1
Sam1984
LQ Newbie
 
Registered: Nov 2008
Posts: 4

Rep: Reputation: 0
Differing network performance for identically connected hosts


Afternoon all,

I'm having a bit of a nightmare with differing network performance on two servers in the same rack, connected in exactly the same way. Here's the situation:

Host 1 - CentOS 4.6 (2.6.9-67.0.4.ELsmp), HP DL360 G4
Host 2 - CentOS 5.1 (2.6.18-92.1.6.el5), HP DL160 G5

Both are connected to the same Cisco switch at 100/FULL (hard coded), with the same config on each port. Both are in the same VLAN. No errors are present on the switchports, or on the host themselves.

Transferring files between servers is pretty much 100Mbps flat out, and transfers to hosts in other similarly connected racks performs very well too.

The difference comes when we have a slower client. For example, a remote client is connected at about 16Mbps. Downloading from Host 1 clocks in ~1.5MB/s. Downloading from Host 2 never exceeds 1.2MB/s. I'm measuring the downloads (fairly crudely) using curl to request a 15MB file from Apache, running on both boxes (same file, same Apache config).

Here's what I've tried so far, all without success:

1. Using a different webserver on both (thttpd)
2. Disabling iptables on both
3. Disabling ipconntrack on Host 2 (as it's installed by default on CentOS5)
4. Decreasing the MTU on both
5. Optimising the TCP stack on host 2 (increasing default TCP window, using cubic congestion avoidance, etc)
6. Changing the network interface on Host 2 (to use a gigabit uplink, different cable and different network and switch port)

Finally, I have run tcpdump to monitor transfers for both servers (client initiates the download of 15MB from the server). They both rise quickly to the same maximum window size, and the only discernible difference I can see is that Host 2 has a lot more "TCP Window Full" entries than Host 1 (with the server sending that message to the client).

Does anyone have any suggestions? I've been looking at this for days now!

Thanks in advance,

Sam
 
Old 11-15-2008, 09:55 AM   #2
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,095
Blog Entries: 2

Rep: Reputation: 111Reputation: 111
Try testing with iperf, it might help you narrow it down, it keeps track of lost packets and resends, could be a flakey card causing lots of drops which on the non-local connection will cause tcp to back off.
 
Old 11-15-2008, 11:27 AM   #3
Sam1984
LQ Newbie
 
Registered: Nov 2008
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks, but unfortunately iperf isn't really much help in this instance. I can't push data to the client from the server (it's firewalled), I can only download from the server whilst connected to the client (and iperf doesn't support this).

After running a few more packet captures on Host 2 it seems that there are quite a lot of duplicate ACKs (one every 5-10 packets it seems), and also the rate at which the client host acknowledges data sent to it from the server is also vastly increased.

There are no retransmissions though, so I don't think packet loss is the cause.

Thanks,

Sam
 
Old 11-15-2008, 01:44 PM   #4
jiobo
Member
 
Registered: Nov 2008
Posts: 180

Rep: Reputation: 36
Wink ship it!

Hi Sam,

Sounds like a real problem...server 2 will have to be quarantined. Pack it up and ship it to me ASAP!
 
Old 11-15-2008, 02:18 PM   #5
Sam1984
LQ Newbie
 
Registered: Nov 2008
Posts: 4

Original Poster
Rep: Reputation: 0
jiobo - Haha at this rate I might as well !

Seriously though, I've been narrowing it down this afternoon...

I believe the problem is caused by not only these duplicate ACKs, but also by the fact that the client appears to be ACK'ing too often. If the client connects to Host1, the ACKs are far less frequent.

Another host with CentOS 5 does not exhibit this problem, although this has a realtek card. A couple of other CentOS 5 hosts (with either the Broadband tg3 or bnx2 driver) do exhibit the issue. I'm starting to think it might be chipset/driver related.

I copied the TCP parameters of the "good" CentOS 5 host to Host2, and there was no change.

I'm going to make a trip to the datacenter to fit a dual GigE Intel card to see if that helps matters.

Still welcoming any other suggestions...

Sam



tcpdump output (client is 192.0.0.1, host2 is 1.0.0.1)

17:25:34.866198 IP 192.0.0.1.2173 > 1.0.0.1.80: S 3261466931:3261466931(0) win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 0>
17:25:34.866205 IP 1.0.0.1.80 > 192.0.0.1.2173: S 3117770620:3117770620(0) ack 3261466932 win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 2>
17:25:34.884499 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 1 win 5840
17:25:34.884766 IP 192.0.0.1.2173 > 1.0.0.1.80: P 1:158(157) ack 1 win 5840
17:25:34.884778 IP 1.0.0.1.80 > 192.0.0.1.2173: . ack 158 win 1728
17:25:34.885079 IP 1.0.0.1.80 > 192.0.0.1.2173: . 1:2921(2920) ack 158 win 1728
17:25:34.904089 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 1 win 5840 <nop,nop,sack 1 {1461:2921}>
17:25:34.904103 IP 1.0.0.1.80 > 192.0.0.1.2173: . 2921:4381(1460) ack 158 win 1728
17:25:34.904106 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 2921 win 8760
17:25:34.904111 IP 1.0.0.1.80 > 192.0.0.1.2173: . 4381:5841(1460) ack 158 win 1728
17:25:34.904115 IP 1.0.0.1.80 > 192.0.0.1.2173: . 5841:7301(1460) ack 158 win 1728
17:25:34.925830 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 7301 win 17520
17:25:34.925837 IP 1.0.0.1.80 > 192.0.0.1.2173: . 7301:8761(1460) ack 158 win 1728
17:25:34.925840 IP 1.0.0.1.80 > 192.0.0.1.2173: P 8761:11681(2920) ack 158 win 1728
17:25:34.925857 IP 1.0.0.1.80 > 192.0.0.1.2173: . 11681:13141(1460) ack 158 win 1728
17:25:34.964395 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 13141 win 29200
17:25:34.964406 IP 1.0.0.1.80 > 192.0.0.1.2173: . 13141:20441(7300) ack 158 win 1728
17:25:34.982275 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 20441 win 30660
17:25:34.982284 IP 1.0.0.1.80 > 192.0.0.1.2173: . 20441:29201(8760) ack 158 win 1728
17:25:34.986048 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 20441 win 30660
17:25:34.990006 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 20441 win 30660
17:25:35.012126 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 29201 win 30660
17:25:35.012131 IP 1.0.0.1.80 > 192.0.0.1.2173: P 29201:35041(5840) ack 158 win 1728
17:25:35.012147 IP 1.0.0.1.80 > 192.0.0.1.2173: . 35041:36501(1460) ack 158 win 1728
17:25:35.032742 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 35041 win 30660
17:25:35.032752 IP 1.0.0.1.80 > 192.0.0.1.2173: . 36501:46721(10220) ack 158 win 1728
17:25:35.036200 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 35041 win 30660
17:25:35.049875 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 43801 win 30660
17:25:35.049881 IP 1.0.0.1.80 > 192.0.0.1.2173: P 46721:51101(4380) ack 158 win 1728
17:25:35.049886 IP 1.0.0.1.80 > 192.0.0.1.2173: . 51101:56941(5840) ack 158 win 1728
17:25:35.053919 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 46721 win 30660
17:25:35.053937 IP 1.0.0.1.80 > 192.0.0.1.2173: . 56941:61321(4380) ack 158 win 1728
17:25:35.058089 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 46721 win 30660
17:25:35.067922 IP 192.0.0.1.2173 > 1.0.0.1.80: . ack 56941 win 30660
 
Old 11-15-2008, 08:49 PM   #6
Sam1984
LQ Newbie
 
Registered: Nov 2008
Posts: 4

Original Poster
Rep: Reputation: 0
Solved!

Solved :-)

Disabling TCP/IP Offloading resolved it. Googling around, it seems that this is a common(ish) issue with broadcom cards and the CentOS 5 kernel.

The lone command "ethtool -K eth0 tso off" fixed all my problems.

Thanks,

Sam
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux - Determine number of unique hosts connected to server linux_linux Linux - Networking 2 03-15-2008 10:41 PM
Simple tool to monitor real-time bandwidth usage of all connected hosts on LAN? mattp52 Linux - Networking 5 02-14-2008 12:18 AM
Two hosts connected by ethernet - unsuccessful breakthestate Linux - Networking 2 01-15-2006 02:10 PM
Differing download speeds when compared to XP computers on the network AntWarrior Linux - Wireless Networking 2 12-07-2004 04:28 AM
Ethernet Connected - Slow Performance :( SML Linux - Newbie 3 03-11-2004 03:26 PM


All times are GMT -5. The time now is 08:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration