What could be causing SYN_ACK's to be delayed from certain clients?

nimnull22 · 02-22-2010, 07:52 PM

Where is located server, which you trying to connect to?
How clients connected in your LAN? Is it the same LAN, the same switch?
Have you compare interface configuration on clients?
Have you try to change IP addresses?

You also said ones, that you can execute tcpdump on the server, if you can please check time, synchronize it, and execute tcpdump "tcpdump -nn -vv" (we do not need too much information) on both sides and do telnet to server IP to any port you like, compare connection on SERVER tcpdump output. Compare the time it takes to get server for both clients.

Compare everything you can. Ethernet settings, interface settings, check for errors on port.

devwatchdog · 02-23-2010, 08:02 AM

Quote:

Originally Posted by fizzdandantilus

00:00:0C:07:AC:02 is the ISP gateway interface and 00:19:e8:e9:03:3f must be another of the ISP routers (multi-homed). It's the only way I can explain it. I can dig further if you think it is worth looking into further.

I imagine your ISP simply set up their environment where inbound traffic comes from one interface, outbound goes to another.

I can confirm that the destination host has MAC 00:16:3E:79:4A:C9.

I do not understand TCP/IP networking enough to know what wscale is for. I appreciate all your efforts to help understand this issue.

I imagine your ISP where the server is hosted simply routes inbound traffic via one interface on a switch or router, then outbound is directed to another.

Just to make sure that isn't a problem, I'd check a few things when you notice a connection attempt is failing.

arp -an

This will show whether 00:00:0c:07:ac:02 is recognized.

Also, I'd check this at that time as well:

tcpdump -nni <interface> icmp or arp

icmp/arp could provide a clue. I don't recall if you've captured traffic for icmp during the failure, but I would do so just to make sure. Those messages can be helpful. Excessive arp requests for a specific IP would indicate the server is trying to send traffic somewhere it cannot locate.

As for wscale possibly being the issue, I somewhat doubt that is the case. Here's an example of what one might see:

http://kerneltrap.org/node/6723

The behavior you see is different than what was experienced in that particular situation.

In any event, to check whether wscale is set to negotiation, you should see this (on a Linux system):

jcwx@haley:~$ cat /proc/sys/net/ipv4/tcp_window_scaling
1

You could turn off window scaling, just for testing purposes.

root@haley:~# echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

That isn't a permanent change. To change it back, you could simply 'echo 1' back to the original setting, or to whatever is set. You could make it permanent by following suggestions made here. Well, if it made a difference, that is.

Test it on a client machine, and see if that has any influence -- although only do it after the connection has failed. It won't fix an attempted connection in process.

I'm going to go over the post where you showed working traffic versus the failed and attempt to determine if there are any other differences.

Something else I saw that looked suspiciously similar to your traffic is this post:

https://dev.openwrt.org/ticket/4489

Apparently some routers don't handle ecn properly. You can see the traffic captures in the comments toward the bottom of that post. You might try testing with ecn disabled to determine if there is a difference.