LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 01-15-2008, 05:28 AM   #1
Kioob
LQ Newbie
 
Registered: Jan 2008
Posts: 4

Rep: Reputation: 1
Unhappy 3000ms delay on TCP connections ?


Hello,

sorry for my poor English, I'll try to be understandable...

since some weeks I have a problem while establishing MySQL connection from a Web server :
we are logging all connection times greater than 500ms, and each time it's greater the delay is between 3000-3006ms or 9000-9006ms.

For me it's not a MySQL problem : during this problem no connection initialisation is visible in logs or in "mytop", and the connection nb limit is not reached at all ( 1-5 on 200 ).
Furthermore, both servers (the web server and the mysql server, both "bi xeon dual core") have a load average lower than 0.5.

We used about 1Mbps on each server, which is connected on 10Mbps FD.
On cacti's ISP, I see no "errors" on the switch.

I try to trace "/proc/net/netstat", and I saw on the web server a lot (about 350) of "TCPLossUndo" during this period. From "netstat -s", I can read this is "congestion windows recovered after partial ack".

Is it really the source of my problem ? And how can I correct this ?

I add :
- both servers are Debian Etch 64bits, bi xeon dual core with 2GB, with a 2.6.23-12 kernel (I tried the Debian 2.6.18 kernel too on the sql server, no change).
- the web server and the mysql server are on the same switch
- TCP SYN cookies are enabled on the web server but disabled on the mysql server. I reduces tcp_syn_retries and tcp_synack_retries at 2, but it doesn't change anything.
- the "congestion control" algo is "cubic" on both, but I tried htcp too : I didn't see any change
- I tried to setup the tcp_rmem and tcp_wmem to values before 2.6.17+ kernel, without change too
- I try to reduce the "tcp keepalive" time, intvl and probes, without any change
- netfilter on mysql allow only connections from the web server (except for SSH)
- ip conntrack max have a value of 65536, and current ip_conntrack_count is about 3000 on the web server and 500 on the mysql.

Thanks for any help...

Olivier

Last edited by Kioob; 01-20-2008 at 04:02 PM.
 
Old 01-18-2008, 08:50 PM   #2
damonhart
LQ Newbie
 
Registered: Nov 2007
Posts: 22

Rep: Reputation: 15
look at higher level issues?

I'm probably out of my league here, but just wanted to suggest that 3 second delays sound like something at a much higher level than networking hardware and the kernel TCP stack. When I've seen delays like these, it generally boils down to things like flakey or inaccessible DNS servers (long timeouts before rotate to next server) or pam_unix throttling (failed authentication for any reason, default delay to slow any brute force attacks.) I can't see how either of these is likely to apply to your situation, but suspect that such long delays could only arise out of being imposed by higher level software.
 
Old 01-20-2008, 04:02 PM   #3
Kioob
LQ Newbie
 
Registered: Jan 2008
Posts: 4

Original Poster
Rep: Reputation: 1
Thanks for your answer, but I don't agree with you : I see a lot of "timeout" delays in the kernel like "3000ms". Furthermore, after a bad configuration of the switch (which produce a lot of "packet errors") all our connections spent 3000ms.

I add : we don't use any DNS query in our connections. The mysql_connect() is done with the server's IP, and the DNS resolve in MySQL is off.

Last edited by Kioob; 01-20-2008 at 04:03 PM.
 
Old 02-14-2008, 12:07 PM   #4
Kioob
LQ Newbie
 
Registered: Jan 2008
Posts: 4

Original Poster
Rep: Reputation: 1
So, after a lot of tests, it seems that the problem come from the disabling of the ipv6 module :

- We have the problem with a "vanilla" kernel, compiled without ipv6
- We have the problem with a Debian (stable and testing) kernel, when we disable the ipv6 module (by adding "alias net-pf-10 off" in /etc/modprobe.d/aliases).
- We haven't any problem with a Debian kernel without the "alias net-pf-10 off" line.

But... why the fact of disabling ipv6 trigger that problem ?
 
Old 03-13-2008, 12:50 PM   #5
Fuzzy_
LQ Newbie
 
Registered: Mar 2008
Posts: 1

Rep: Reputation: 0
Exclamation partial explanation

Hi,

I'm experiencing the exact same problem for 2 weeks now and found out that the only difference between a server that has this occasional 3 seconds delay and one that hasn't is CONFIG_IPV6 in the kernel.

I went a little further and confirmed the explanation of this delay:
- "Sometimes" the client try to establish TCP connections to the server (I had the problem with MySQL as well). To do so, it send a TCP SYN packet.
- The server replies by a "SYN,ACK" packet
- At this point, the client must have received the "SYN,ACK" packet, reply with a "ACK" packet and set the connection to "ESTABLISHED". Unfortunately, it does not.

What the client does instead, is for an unknown reason (yet, I hope), to ignore the "SYN,ACK" packet the server sent. This triggers a timeout whose duration is exactly... 3 seconds (hardcoded in the kernel).

After this timeout, the client tries again by sending a "SYN", but this time does not ignore the "SYN,ACK" reply and establish the connection as wanted.

I have for now absolutely no idea why the ipv6 stack solves this problem, but I seems to be a good idea to enable it for now.

Note: all of this has been confirmed by network traces simultaneously on the client and the server.

I haven't been able to reproduce this bug easily: the only way was to set up an experimental client/server testbed + generating artificial traffic on it.

I will dig in the kernel network code and see what I can find.

Gabriel
 
Old 06-25-2009, 07:27 AM   #6
astletron
LQ Newbie
 
Registered: Jun 2009
Posts: 1

Rep: Reputation: 0
any resolution?

I seem to be having a similar problem.

You ever find a solution for this?
 
Old 06-26-2009, 12:34 AM   #7
Kioob
LQ Newbie
 
Registered: Jan 2008
Posts: 4

Original Poster
Rep: Reputation: 1
Hi,

yes, this was fixed in 2.6.24.4 or 2.6.24.5 kernels. There is no more that problem since that.
 
1 members found this post helpful.
Old 09-22-2011, 07:41 AM   #8
filippo.crea
LQ Newbie
 
Registered: Nov 2004
Posts: 9

Rep: Reputation: 0
Hi,

does anybody know the related bug number?

I need to know if the same fix is present on Real Time kernel (MRG) vers. 1.3 on RH 5.6.

Thanks,

Filippo
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Kill TCP connections ballistic509 Debian 2 03-14-2007 05:33 PM
Linux change TCP kernel Parameter for TCP DELAY ACK TICKS linux_mando Linux - Networking 5 08-22-2006 08:20 AM
Big delay stablishing connections Singing Banzo Linux - Networking 9 07-26-2006 08:21 AM
how many TCP connections at a time? hegdeshashi Linux - Networking 5 01-05-2006 11:19 PM
Need a way to limit TCP connections ewerta Linux - Networking 1 07-25-2005 03:54 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 03:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration