Quote:
Originally Posted by fizzdandantilus
00:00:0C:07:AC:02 is the ISP gateway interface and 00:19:e8:e9:03:3f must be another of the ISP routers (multi-homed). It's the only way I can explain it. I can dig further if you think it is worth looking into further.
I imagine your ISP simply set up their environment where inbound traffic comes from one interface, outbound goes to another.
I can confirm that the destination host has MAC 00:16:3E:79:4A:C9.
I do not understand TCP/IP networking enough to know what wscale is for. I appreciate all your efforts to help understand this issue.
|
I imagine your ISP where the server is hosted simply routes inbound traffic via one interface on a switch or router, then outbound is directed to another.
Just to make sure that isn't a problem, I'd check a few things when you notice a connection attempt is failing.
arp -an
This will show whether 00:00:0c:07:ac:02 is recognized.
Also, I'd check this at that time as well:
tcpdump -nni <interface> icmp or arp
icmp/arp could provide a clue. I don't recall if you've captured traffic for icmp during the failure, but I would do so just to make sure. Those messages can be helpful. Excessive arp requests for a specific IP would indicate the server is trying to send traffic somewhere it cannot locate.
As for wscale possibly being the issue, I somewhat doubt that is the case. Here's an example of what one might see:
http://kerneltrap.org/node/6723
The behavior you see is different than what was experienced in that particular situation.
In any event, to check whether wscale is set to negotiation, you should see this (on a Linux system):
jcwx@haley:~$ cat /proc/sys/net/ipv4/tcp_window_scaling
1
You could turn off window scaling, just for testing purposes.
root@haley:~# echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
That isn't a permanent change. To change it back, you could simply 'echo 1' back to the original setting, or to whatever is set. You could make it permanent by following suggestions made
here. Well, if it made a difference, that is.
Test it on a client machine, and see if that has any influence -- although only do it after the connection has failed. It won't fix an attempted connection in process.
I'm going to go over the post where you showed working traffic versus the failed and attempt to determine if there are any other differences.
Something else I saw that looked suspiciously similar to your traffic is this post:
https://dev.openwrt.org/ticket/4489
Apparently some routers don't handle ecn properly. You can see the traffic captures in the comments toward the bottom of that post. You might try testing with ecn disabled to determine if there is a difference.