LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (http://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Bizarre TCP connectivity issues from certain clients, totally mystified! (http://www.linuxquestions.org/questions/linux-networking-3/bizarre-tcp-connectivity-issues-from-certain-clients-totally-mystified-771804/)

ponga 11-26-2009 08:51 PM

Bizarre TCP connectivity issues from certain clients, totally mystified!
 
Greetings all. I've hit a wall here and simply cannot put the dots together on this problem. Any help is appreciated!

I have a couple websites that I host, recently deployed. I have gotten reports and confirmed myself that attempting to navigate my sites, the connections are timing out! This is only happening from a *certain subset* of users. Everyone else has no complaints. So I am trying to figure out what the common ground is here to try and fix it, but have hit a wall in doing so. I'm out of ideas.

The setup: This is kinda long, my apologies:
*Qwest DSL (2Mbps/768kbps) PPoE using Actiontec GT701R, static IP, ISP blocks no ports or does not shape traffic (so they say).
*Debian Lenny firewall, iptables, 2.6.26 kernel, rp-ppoe driver, running NAT/MASQ.
*Web server resides on VmWare Server 2.0.2, bridged network.
*Web server is Debian Lenny, Apache2 (latest .deb).

I should note that I also use this connection for my recreational Internet - it's perfectly fine. I'm getting the speed as advertised and nitro.ucsc.edu reports no errors, nor have I had any issues with connectivity whatsoever.

*The working clients: Working clients are Windows IE & Firefox, some Linux and some PPC Mac's (OSX) - have reported no issues, I can confirm this.

*The NOT working clients are: Some PPC Mac's (OSX Camino, Firefox and Safari fail on a couple machines) and my mobile phone (Samsung with AT&T) fails 100% and 50% of the time, my Linux (latest CentOS) fails from work using either Opera or Firefox.

The funny thing is, I've tested is from a cable connection - a Windows 7 + Firefox works fine, on the same connection, a PPC Mac fails.

*The behavior on the failing clients is, you can load the main page, but subsequent links fail. http POST's *usually fail*... some pages works, but most don't. The result on the client side is a network timeout. The result on the server side (apache logs) is, I never even see the request for which is timing out on the client side!

So, this is a TCP issue. After looking at packet captures and seeing it for myself, it has to be a TCP issue. (But I'm not 100% sure.)

I took some packet captures.. (text below) but I can't make heads or tails of it. It APPEARS there is packet loss. Even then, I would suppose TCP could handle this. I'm not seeing any TCP resets. All I can tell is... the server and client... just get OFF the same page. I don't know how else to describe it.

*Anyway, there has to be a solution to this. But I'm just not seeing the forest because of the trees. Any help is MUCH appreciated!

I have legit pcap's and you can have the url's if you like, just ask. And seriously THANKS. This is driving me crazy...

* This is a capture using my samsung mobile phone as the test client. I have the other captures for the other failing clients (Mac, etc) They look essentially the same.

Again, TIA! --ponga

------------------------------------------------------------------------------------
Code:

No.    Time        Source                Destination          Protocol Info
      5 46.470893  client_ip_addr        server_ip_addr        TCP      24656 > http [SYN] Seq=0 Win=33580 Len=0 MSS=1460 WS=0
      6 46.471066  server_ip_addr        client_ip_addr        TCP      http > 24656 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 WS=6
      7 46.559154  client_ip_addr        server_ip_addr        TCP      24656 > http [ACK] Seq=1 Ack=1 Win=33580 Len=0
      8 46.562949  client_ip_addr        server_ip_addr        HTTP    GET / HTTP/1.1
      9 46.563244  server_ip_addr        client_ip_addr        TCP      http > 24656 [ACK] Seq=1 Ack=1254 Win=8384 Len=0
    10 46.564246  server_ip_addr        client_ip_addr        HTTP    HTTP/1.1 301 Moved Permanently  (text/html)
    11 46.660908  client_ip_addr        server_ip_addr        TCP      24656 > http [ACK] Seq=1254 Ack=641 Win=33580 Len=0
    12 47.748387  client_ip_addr        server_ip_addr        TCP      24656 > http [FIN, ACK] Seq=1254 Ack=641 Win=33580 Len=0
    13 47.748810  server_ip_addr        client_ip_addr        TCP      http > 24656 [FIN, ACK] Seq=641 Ack=1255 Win=8384 Len=0
    14 47.748856  client_ip_addr        server_ip_addr        TCP      26027 > http [SYN] Seq=0 Win=33580 Len=0 MSS=1460 WS=0
    15 47.749085  server_ip_addr        client_ip_addr        TCP      http > 26027 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 WS=6
    16 47.836894  client_ip_addr        server_ip_addr        TCP      24656 > http [ACK] Seq=1255 Ack=642 Win=33580 Len=0
    17 47.838601  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=1 Ack=1 Win=33580 Len=0
    18 47.842919  client_ip_addr        server_ip_addr        HTTP    GET /tiki-mobile.php HTTP/1.1
    19 47.843194  server_ip_addr        client_ip_addr        TCP      http > 26027 [ACK] Seq=1 Ack=1325 Win=8768 Len=0
    20 48.143480  server_ip_addr        client_ip_addr        HTTP    HTTP/1.1 200 OK  (application/vnd.wap.xhtml+xml)
    21 48.251028  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=1325 Ack=1356 Win=33580 Len=0
    22 54.504363  client_ip_addr        server_ip_addr        HTTP    GET /tiki-list_articles.php?mode=mobile HTTP/1.1
    23 54.504552  server_ip_addr        client_ip_addr        TCP      http > 26027 [ACK] Seq=1356 Ack=2764 Win=11712 Len=0
    24 55.264068  server_ip_addr        client_ip_addr        TCP      [TCP segment of a reassembled PDU]
    25 55.372478  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=2764 Ack=2808 Win=32128 Len=0
    26 55.372692  server_ip_addr        client_ip_addr        TCP      [TCP Previous segment lost] [TCP segment of a reassembled PDU]
    27 55.372772  server_ip_addr        client_ip_addr        TCP      [TCP segment of a reassembled PDU]
    28 55.482368  client_ip_addr        server_ip_addr        TCP      [TCP Dup ACK 25#1] 26027 > http [ACK] Seq=2764 Ack=2808 Win=32128 Len=0 SLE=2816 SRE=4268
    29 55.482592  server_ip_addr        client_ip_addr        TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
    30 55.482664  server_ip_addr        client_ip_addr        TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
    31 55.482723  server_ip_addr        client_ip_addr        TCP      [TCP segment of a reassembled PDU]
    32 55.483827  client_ip_addr        server_ip_addr        TCP      [TCP Dup ACK 25#2] 26027 > http [ACK] Seq=2764 Ack=2808 Win=32128 Len=0 SLE=2816 SRE=4276
    33 55.483979  server_ip_addr        client_ip_addr        TCP      [TCP segment of a reassembled PDU]
    34 55.570331  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=2764 Ack=4276 Win=32128 Len=0
    35 55.571527  client_ip_addr        server_ip_addr        TCP      [TCP Dup ACK 34#1] 26027 > http [ACK] Seq=2764 Ack=4276 Win=32128 Len=0
    36 55.571575  server_ip_addr        client_ip_addr        TCP      [TCP segment of a reassembled PDU]
    37 55.592991  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=2764 Ack=5728 Win=30676 Len=0
    38 55.593186  server_ip_addr        client_ip_addr        TCP      [TCP segment of a reassembled PDU]
    39 55.680007  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=2764 Ack=7188 Win=32128 Len=0
    40 55.688136  server_ip_addr        client_ip_addr        HTTP    HTTP/1.1 200 OK  (application/vnd.wap.xhtml+xml)
    41 55.772368  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=2764 Ack=7196 Win=33580 Len=0
    42 55.876368  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=2764 Ack=7208 Win=33580 Len=0
    43 67.750772  client_ip_addr        server_ip_addr        HTTP    [TCP Previous segment lost] Continuation or non-HTTP traffic
    44 67.750963  server_ip_addr        client_ip_addr        TCP      [TCP Dup ACK 40#1] http > 26027 [ACK] Seq=7208 Ack=2764 Win=11712 Len=0 SLE=4224 SRE=4235
    45 69.891559  server_ip_addr        client_ip_addr        TCP      http > 26027 [FIN, ACK] Seq=7208 Ack=2764 Win=11712 Len=0 SLE=4224 SRE=4235
    46 69.979327  client_ip_addr        server_ip_addr        TCP      26027 > http [ACK] Seq=4235 Ack=7209 Win=33580 Len=0
    47 69.979766  client_ip_addr        server_ip_addr        TCP      26027 > http [FIN, ACK] Seq=4235 Ack=7209 Win=33580 Len=0
    48 69.979883  server_ip_addr        client_ip_addr        TCP      [TCP Dup ACK 45#1] http > 26027 [ACK] Seq=7209 Ack=2764 Win=11712 Len=0 SLE=4224 SRE=4236
    49 69.980030  client_ip_addr        server_ip_addr        TCP      47079 > http [SYN] Seq=0 Win=33580 Len=0 MSS=1460 WS=0
    50 69.980161  server_ip_addr        client_ip_addr        TCP      http > 47079 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 WS=6
    51 70.069512  client_ip_addr        server_ip_addr        TCP      47079 > http [ACK] Seq=1 Ack=1 Win=33580 Len=0
    52 70.070192  client_ip_addr        server_ip_addr        HTTP    [TCP Previous segment lost] Continuation or non-HTTP traffic
    53 70.070347  server_ip_addr        client_ip_addr        TCP      [TCP Window Update] http > 47079 [ACK] Seq=1 Ack=1 Win=5888 Len=0 SLE=1461 SRE=1499
    58 132.556233  client_ip_addr        server_ip_addr        TCP      47079 > http [FIN, ACK] Seq=1499 Ack=1 Win=33580 Len=0
    59 132.556518  server_ip_addr        client_ip_addr        TCP      [TCP Dup ACK 53#1] http > 47079 [ACK] Seq=1 Ack=1 Win=5888 Len=0 SLE=1461 SRE=1500
------------------------------------------------------------------------------------

end.

nimnull22 11-26-2009 09:34 PM

server_ip_addr client_ip_addr TCP http > 26027 [ACK] Seq=1356 Ack=2764 Win=11712 Len=0
server_ip_addr client_ip_addr TCP [TCP segment of a reassembled PDU]
client_ip_addr server_ip_addr TCP 26027 > http [ACK] Seq=2764 Ack=2808 Win=32128 Len=0
server_ip_addr client_ip_addr TCP [TCP Previous segment lost] [TCP segment of a reassembled PDU]
server_ip_addr client_ip_addr TCP [TCP segment of a reassembled PDU]

client_ip_addr server_ip_addr TCP [TCP Dup ACK 25#1] 26027 > http [ACK] Seq=2764 Ack=2808 Win=32128 Len=0 SLE=2816 SRE=4268
server_ip_addr client_ip_addr TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
server_ip_addr client_ip_addr TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
server_ip_addr client_ip_addr TCP [TCP segment of a reassembled PDU]
client_ip_addr server_ip_addr TCP [TCP Dup ACK 25#2] 26027 > http [ACK] Seq=2764 Ack=2808 Win=32128 Len=0 SLE=2816 SRE=4276


Looks like some switch or router can't menage fragmented packets.
I would suggest to check fragmentation.
May be MTU is large?

Also I suggest to use tcpdump for diagnostic purposes.

gratuitous_arp 11-26-2009 09:59 PM

Well, have you tried using some other application to access the server while the problem is occuring (like ping)?

Can you get back to the main page after a subsequent page fails to load?

What are the sizes of the web pages you are trying to load?

Most forward-thinking operating systems set the "Don't Fragment" bit in the IP header. Check your captures to see if the problem clients do or do not (this can be done with Wireshark or verbose mode in tcpdump).

ponga 11-26-2009 10:20 PM

Thanks guys for the quick replies!! Pulling my hair out over here.

nimnull22:
On the fragmentation side, ya I suppose that is possible, and it would make sense. My MTU is 1492 bytes on my DSL interface, inhouse ethernet is 1500. I have iptables doing the works there with a "clamp-to-PMTU" line.. but, maybe I have that wrong and iptables is not doing it after all or maybe 1492 is too large! I'll investigate that... And ya, I'm taking the caps with tcpdump, the output you see was just an export from Wireshark to I could easily obfuscate IP's and port it..

gratuitous_arp:
>Well, have you tried using some other application to access the server
>while the problem is occurring (like ping)?
Yes, ICMP is fine and what is even more weird, SSH is unaffected.

>Can you get back to the main page after a subsequent page fails to load?
Nope, not until the connection times out.

>What are the sizes of the web pages you are trying to load?
I was thinking that too, from the fragmentation perspective. The pages are all similar in size, but similar does not mean same. Although, I CAN get to a page, if I go there directly. So, it's not the size of the page that is the problem. But if I hit a page, maybe two, after that I try to get another page... I'm dead in the water.

I also tried turning off apache2 KeepAlive, as a shot in the dark... no dice.

Given that, what do you think??

THANK YOU guys for the SUPER FAST replies!!
--ponga

ponga 11-26-2009 11:24 PM

Hah! Fragmentation it was!!! Crazy. I set my MTU on the DSL interface to 1400.. shazzam! Site works perfectly.

Strange though. I'm not blocking ICMP unreachable or fragmentation needed packets... but maybe the clients from those other networks were. Also, why SSH was not affected... Strange.

Anyway, it was your guys ideas that helped me. At least now I know what the issue was and can take appropriate action to resolve it.

THANKS!!!!!
--ponga

gratuitous_arp 11-27-2009 09:47 AM

Glad you got things worked out. It seems there was a problem with path MTU discovery, which should handle remote sites with lower MTUs automatically. If you are curious, here is a good link explaining it:

http://www.netheaven.com/pmtu.html

Smartpatrol 12-01-2009 12:02 AM

...


All times are GMT -5. The time now is 11:06 PM.