Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
sorry for my poor English, I'll try to be understandable...
since some weeks I have a problem while establishing MySQL connection from a Web server :
we are logging all connection times greater than 500ms, and each time it's greater the delay is between 3000-3006ms or 9000-9006ms.
For me it's not a MySQL problem : during this problem no connection initialisation is visible in logs or in "mytop", and the connection nb limit is not reached at all ( 1-5 on 200 ).
Furthermore, both servers (the web server and the mysql server, both "bi xeon dual core") have a load average lower than 0.5.
We used about 1Mbps on each server, which is connected on 10Mbps FD.
On cacti's ISP, I see no "errors" on the switch.
I try to trace "/proc/net/netstat", and I saw on the web server a lot (about 350) of "TCPLossUndo" during this period. From "netstat -s", I can read this is "congestion windows recovered after partial ack".
Is it really the source of my problem ? And how can I correct this ?
I add :
- both servers are Debian Etch 64bits, bi xeon dual core with 2GB, with a 2.6.23-12 kernel (I tried the Debian 2.6.18 kernel too on the sql server, no change).
- the web server and the mysql server are on the same switch
- TCP SYN cookies are enabled on the web server but disabled on the mysql server. I reduces tcp_syn_retries and tcp_synack_retries at 2, but it doesn't change anything.
- the "congestion control" algo is "cubic" on both, but I tried htcp too : I didn't see any change
- I tried to setup the tcp_rmem and tcp_wmem to values before 2.6.17+ kernel, without change too
- I try to reduce the "tcp keepalive" time, intvl and probes, without any change
- netfilter on mysql allow only connections from the web server (except for SSH)
- ip conntrack max have a value of 65536, and current ip_conntrack_count is about 3000 on the web server and 500 on the mysql.
I'm probably out of my league here, but just wanted to suggest that 3 second delays sound like something at a much higher level than networking hardware and the kernel TCP stack. When I've seen delays like these, it generally boils down to things like flakey or inaccessible DNS servers (long timeouts before rotate to next server) or pam_unix throttling (failed authentication for any reason, default delay to slow any brute force attacks.) I can't see how either of these is likely to apply to your situation, but suspect that such long delays could only arise out of being imposed by higher level software.
Thanks for your answer, but I don't agree with you : I see a lot of "timeout" delays in the kernel like "3000ms". Furthermore, after a bad configuration of the switch (which produce a lot of "packet errors") all our connections spent 3000ms.
I add : we don't use any DNS query in our connections. The mysql_connect() is done with the server's IP, and the DNS resolve in MySQL is off.
So, after a lot of tests, it seems that the problem come from the disabling of the ipv6 module :
- We have the problem with a "vanilla" kernel, compiled without ipv6
- We have the problem with a Debian (stable and testing) kernel, when we disable the ipv6 module (by adding "alias net-pf-10 off" in /etc/modprobe.d/aliases).
- We haven't any problem with a Debian kernel without the "alias net-pf-10 off" line.
But... why the fact of disabling ipv6 trigger that problem ?
I'm experiencing the exact same problem for 2 weeks now and found out that the only difference between a server that has this occasional 3 seconds delay and one that hasn't is CONFIG_IPV6 in the kernel.
I went a little further and confirmed the explanation of this delay:
- "Sometimes" the client try to establish TCP connections to the server (I had the problem with MySQL as well). To do so, it send a TCP SYN packet.
- The server replies by a "SYN,ACK" packet
- At this point, the client must have received the "SYN,ACK" packet, reply with a "ACK" packet and set the connection to "ESTABLISHED". Unfortunately, it does not.
What the client does instead, is for an unknown reason (yet, I hope), to ignore the "SYN,ACK" packet the server sent. This triggers a timeout whose duration is exactly... 3 seconds (hardcoded in the kernel).
After this timeout, the client tries again by sending a "SYN", but this time does not ignore the "SYN,ACK" reply and establish the connection as wanted.
I have for now absolutely no idea why the ipv6 stack solves this problem, but I seems to be a good idea to enable it for now.
Note: all of this has been confirmed by network traces simultaneously on the client and the server.
I haven't been able to reproduce this bug easily: the only way was to set up an experimental client/server testbed + generating artificial traffic on it.
I will dig in the kernel network code and see what I can find.