LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 02-23-2011, 06:10 AM   #1
bzlaskar
Member
 
Registered: May 2006
Location: Bangalore, INDIA
Distribution: Fedora Core
Posts: 69
Blog Entries: 2

Rep: Reputation: 16
Clarifiication needed on tcp_keepalive messages ?


Hi All,

Need some clarifications on TCP_keepalive . We are facing some issues
on our Prod servers related to TCP functionality .

The issue is like this for us.

We have some machines at one end sending data in real time to another
group of machines on the other end. Now due to some hardware issues on
the other end , some of the machines becomes unresponsive/crashes.
The client system which pumps data never came to know that the server
went unresponsive . The connection remains in ESTABLISHED state and
the client always tries to send data thinking that the connection is
alive because of which we are seeing backlog on client sides.

Our understanding is like this on how TCP will handle the situation.


Q 1) Since the server went down , the client will try to the
retransmit the data until it times out. What is the
behavior of TCP after the timeout? Need clarification on the following
things.
a) Will the kernel will close the established connection after
the timeout . Looks like no in our case as we
still see the connection still in ESTABLISHED state after
around more than 2 hours.
b) Are there any kernel parameters which decides that when the
client will timeout after retransmission
fails. What is the behavior of TCP after the client
retransmission timeouts.


Q 2 ) Tried with tcp_keepalive also . Default keepalive time comes
to be around 2 hrs 2 minutes , i think? . My understanding regarding
this is that the client will send some TCP probes after the keepalive
time interval and if it cannot reach the server , then the established
connection in the client side will be closed by the kernel . But I can
see that the connection still remains in established after the
tcp_keepalive
time . We waited for around 3 hrs but the connection still remains in
established state. Tried reducing the keepalive time to be around 10
minutes , but the connection remains in ESTABLISHED state in client
side after the 10 minute interval .

Where I went wrong or my understanding .Please clarify my doubts
raised above . What should we do
to resolve the problem we are seeing above . Any help will be highly
appreciated as we are going through a hard time to resolve the issue .

Thanks in Advance
 
Old 02-23-2011, 11:03 AM   #2
datamove
LQ Newbie
 
Registered: Feb 2011
Posts: 17

Rep: Reputation: 3
The app can override kernel keep alive settings with socket options. Then, if the app doesn't use socket keep alive options, then kernel settings don't matter. Check how you app handles tcp connection.
 
Old 02-23-2011, 11:19 AM   #3
tommylovell
Member
 
Registered: Nov 2005
Distribution: Fedora, Redhat
Posts: 369

Rep: Reputation: 98
I'm no expert here, so this is probably more opinion than fact, and I hope others will correct me.

So, in my opinion... in answer to Q1.

Quote:
a) Will the kernel will close the established connection after the timeout .
No. IMO, the tcp stack will simply report an error ETIMEDOUT back to the application. (It varies a little based on whether a blocking or non-blocking call is made.)

Quote:
b) Are there any kernel parameters which decides that when the client will timeout after retransmission fails. What is the behavior of TCP after the client retransmission timeouts.
I'm not exactly certain what you are asking, but after the timeout, the kernel does not (as you've seen) close the connection. Once again, IMO, It is the application's responsibility to either close the connection or take some other action.

Quote:
Q 2 ) Tried with tcp_keepalive also . Default keepalive time comes to be around 2 hrs 2 minutes , i think? .
I don't know the exact values but the system-wide defaults are set with sysctl.
Code:
[root@athlonz ~]# sysctl -a | grep keep
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
Quote:
My understanding regarding this is that the client will send some TCP probes after the keepalive time interval...
If by "client" you meant the tcp/ip stack on the client side, then yes, the client does send keepalive packets. (The tcp/ip stack on the server side does this as well.)

Quote:
...and if it cannot reach the server , then the established connection in the client side will be closed by the kernel .
I don't think that's true. I'm not certain, but I think that even though a keepalive is sent, a timeout of it is only acted upon if the SO_KEEPALIVE option is set for that socket. And if SO_KEEPALIVE is set, then I think this is the same case as with a timed-out write, an error is returned to the application code. I think it is still the application's responsibility to close the connection, not the kernel or tcp/ip stack code. The only purpose of keepalive is to inject traffic on an otherwise idle connection to check it's viability. If the keepalive fails, it gives the application the opportunity to close the connection and report the error.

I recall reading once that keepalive was controversial when it was introduced for a number of reasons. One problem with it is that it can't tell the difference between a facility being temporarily down (someone unplugged a cable temporarily while they were neatening up a wiring closet), versus a permanent failure. Thus the high value for the system-wide keepalive. Although 10 minutes might make more sense than two hours.

I would like to hear other comments and would welcome corrections.

(And, of course, we are talking about the BSD tcp/ip stack in Linux and not whatever Microsoft decided to implement, or Sun, or IBM for AIX...)

Last edited by tommylovell; 02-23-2011 at 11:22 AM.
 
  


Reply

Tags
tcpip


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Postfix: User messages are okay; but error messages themselves seem to be bouncing jgombos Linux - Server 3 03-03-2007 02:21 PM
Repeating messages in /var/log/messages skubik Linux - General 2 12-23-2005 03:47 PM
Redirecting the kernel messages to file other than /var/log/messages jyotika_b83 Linux - General 3 04-28-2005 06:39 PM
/var/log/messages full of these messages. Should I be concerned? mdavis Linux - Security 5 04-16-2004 10:08 AM
syslog and firestarter - log messages to another file than messages mule Linux - Newbie 0 08-07-2003 03:35 AM


All times are GMT -5. The time now is 05:13 AM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration