LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   3000ms delay on TCP connections ? (https://www.linuxquestions.org/questions/linux-networking-3/3000ms-delay-on-tcp-connections-613670/)

Kioob 01-15-2008 05:28 AM

3000ms delay on TCP connections ?
 
Hello,

sorry for my poor English, I'll try to be understandable...

since some weeks I have a problem while establishing MySQL connection from a Web server :
we are logging all connection times greater than 500ms, and each time it's greater the delay is between 3000-3006ms or 9000-9006ms.

For me it's not a MySQL problem : during this problem no connection initialisation is visible in logs or in "mytop", and the connection nb limit is not reached at all ( 1-5 on 200 ).
Furthermore, both servers (the web server and the mysql server, both "bi xeon dual core") have a load average lower than 0.5.

We used about 1Mbps on each server, which is connected on 10Mbps FD.
On cacti's ISP, I see no "errors" on the switch.

I try to trace "/proc/net/netstat", and I saw on the web server a lot (about 350) of "TCPLossUndo" during this period. From "netstat -s", I can read this is "congestion windows recovered after partial ack".

Is it really the source of my problem ? And how can I correct this ?

I add :
- both servers are Debian Etch 64bits, bi xeon dual core with 2GB, with a 2.6.23-12 kernel (I tried the Debian 2.6.18 kernel too on the sql server, no change).
- the web server and the mysql server are on the same switch
- TCP SYN cookies are enabled on the web server but disabled on the mysql server. I reduces tcp_syn_retries and tcp_synack_retries at 2, but it doesn't change anything.
- the "congestion control" algo is "cubic" on both, but I tried htcp too : I didn't see any change
- I tried to setup the tcp_rmem and tcp_wmem to values before 2.6.17+ kernel, without change too
- I try to reduce the "tcp keepalive" time, intvl and probes, without any change
- netfilter on mysql allow only connections from the web server (except for SSH)
- ip conntrack max have a value of 65536, and current ip_conntrack_count is about 3000 on the web server and 500 on the mysql.

Thanks for any help...

Olivier

damonhart 01-18-2008 08:50 PM

look at higher level issues?
 
I'm probably out of my league here, but just wanted to suggest that 3 second delays sound like something at a much higher level than networking hardware and the kernel TCP stack. When I've seen delays like these, it generally boils down to things like flakey or inaccessible DNS servers (long timeouts before rotate to next server) or pam_unix throttling (failed authentication for any reason, default delay to slow any brute force attacks.) I can't see how either of these is likely to apply to your situation, but suspect that such long delays could only arise out of being imposed by higher level software.

Kioob 01-20-2008 04:02 PM

Thanks for your answer, but I don't agree with you : I see a lot of "timeout" delays in the kernel like "3000ms". Furthermore, after a bad configuration of the switch (which produce a lot of "packet errors") all our connections spent 3000ms.

I add : we don't use any DNS query in our connections. The mysql_connect() is done with the server's IP, and the DNS resolve in MySQL is off.

Kioob 02-14-2008 12:07 PM

So, after a lot of tests, it seems that the problem come from the disabling of the ipv6 module :

- We have the problem with a "vanilla" kernel, compiled without ipv6
- We have the problem with a Debian (stable and testing) kernel, when we disable the ipv6 module (by adding "alias net-pf-10 off" in /etc/modprobe.d/aliases).
- We haven't any problem with a Debian kernel without the "alias net-pf-10 off" line.

But... why the fact of disabling ipv6 trigger that problem ?

Fuzzy_ 03-13-2008 12:50 PM

partial explanation
 
Hi,

I'm experiencing the exact same problem for 2 weeks now and found out that the only difference between a server that has this occasional 3 seconds delay and one that hasn't is CONFIG_IPV6 in the kernel.

I went a little further and confirmed the explanation of this delay:
- "Sometimes" the client try to establish TCP connections to the server (I had the problem with MySQL as well). To do so, it send a TCP SYN packet.
- The server replies by a "SYN,ACK" packet
- At this point, the client must have received the "SYN,ACK" packet, reply with a "ACK" packet and set the connection to "ESTABLISHED". Unfortunately, it does not.

What the client does instead, is for an unknown reason (yet, I hope), to ignore the "SYN,ACK" packet the server sent. This triggers a timeout whose duration is exactly... 3 seconds (hardcoded in the kernel).

After this timeout, the client tries again by sending a "SYN", but this time does not ignore the "SYN,ACK" reply and establish the connection as wanted.

I have for now absolutely no idea why the ipv6 stack solves this problem, but I seems to be a good idea to enable it for now.

Note: all of this has been confirmed by network traces simultaneously on the client and the server.

I haven't been able to reproduce this bug easily: the only way was to set up an experimental client/server testbed + generating artificial traffic on it.

I will dig in the kernel network code and see what I can find.

Gabriel

astletron 06-25-2009 07:27 AM

any resolution?
 
I seem to be having a similar problem.

You ever find a solution for this?

Kioob 06-26-2009 12:34 AM

Hi,

yes, this was fixed in 2.6.24.4 or 2.6.24.5 kernels. There is no more that problem since that.

filippo.crea 09-22-2011 07:41 AM

Hi,

does anybody know the related bug number?

I need to know if the same fix is present on Real Time kernel (MRG) vers. 1.3 on RH 5.6.

Thanks,

Filippo


All times are GMT -5. The time now is 07:35 PM.