LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   tcp\ip ack timeout and packet retransmission - Solved (https://www.linuxquestions.org/questions/linux-networking-3/tcp%5Cip-ack-timeout-and-packet-retransmission-solved-864033/)

Fabio Paolini 02-21-2011 08:00 AM

tcp\ip ack timeout and packet retransmission - Solved
 
Hi, I have a socket connection where there are some problems with packet lost. The client sends a packet to the server and if the packet does not arrive the destination the client does not send a second packet as I imagine the tcp\ip should assure.

Reading the man tcp I realized that the parameter "tcp_retries2" controls the time the client takes to detect the connection lost (I have set tcp_retries2=4 and the client took around 15 seconds to detect connection lost). But I was not able to control the way the packets are resent.

I made the following experience:

I disconnected the ethernet cable from the connection and then I sent the packet via client, then I awaited some time, short enough for the terminal does not recognize the lost connection and long enough to the client does not be able to send the packet. So the client never sends the packet again. How can it be fixed? I want a configuration where the client detects the connection lost or sends a second packet if the connection is restarted fast enough.

One more doubt:

Is there a way to determine the ack timeout?
Is the tcp_retries1 that controls the number of times a packet is sent?

Thanks.
Fabio

tommylovell 02-21-2011 11:21 AM

Ok. This is a tough one, and no one has responded in three hours, so I'll take a stab at it. I may be wrong on some of the technical details and would welcome comments and especially corrections. I'm working on the premise that a bad answer may be better than no answer at all. :)

Quote:

Hi, I have a socket connection where there are some problems with packet lost. The client sends a packet to the server and if the packet does not arrive the destination the client does not send a second packet as I imagine the tcp\ip should assure.
I have looked at hundreds (possibly thousands) of Sniffer and tcpdump traces and have never seen a situation where tcp has failed to retransmit when an ack was late.

Quote:

Reading the man tcp I realized that the parameter "tcp_retries2" controls the time the client takes to detect the connection lost
That is not entirely accurate. "tcp_retries2" controls the number of retransmissions that will take place before tcp will inform the application that an error has occurred, not the amount of time. Typically the error reported is ETIMEDOUT. And at that point the connection is not lost.

tcp/ip is often referred to as a "self tuning" protocol. What is meant by that is that it keeps track of various metrics, like packet round-trip time,
receive window size, network congestion, etc.

In terms of detecting packet loss, the TCP/IP Stack that is used in Linux uses an algorithm to constantly calculate the smoothed round-trip time (srtt) for data that it is sending on an established connection and use this as the basis for the Retransmission Timeout (RTO). When tcp determines that the RTO has been exceeded, it retransmits. An "exponential backoff" is applied to the RTO meaning the RTO for subsequent packets is multiplied by 1, then 2, then 4, doubling up to 64. So, on a connection with a 10ms RTO, you would see timeouts of 10ms, 20ms, 40ms, 80ms, 160ms, 320ms, etc. When you get to "tcp_retries2" retries, it considers the error permanent and tells the app.

Note that until you hit the maximum number of retries, your app knows nothing about the retransmissions. Also bear in mind that when the error is finally reported to the application, the connection is still established. The application can choose to close the connection or it can keep trying that I/O operation.

Quote:

I disconnected the ethernet cable from the connection and then I sent the packet via client, then I awaited some time, short enough for the terminal does not recognize the lost connection and long enough to the client does not be able to send the packet. So the client never sends the packet again.
Unless you have a really high RTO and/or you are transmitting over a verrrrry slow connection (300bps modem comes to mind), you can't unplug and plug a cable fast enough...

Have you verified this in a 'tcpdump' on the client end.

Quote:

How can it be fixed?
I suspect this is a problem with your client code failing to properly handle errors that have been reported to it. Otherwise, it's a serious error in the tcp stack and should be affecting a lot of people.

Quote:

I want a configuration where the client detects the connection lost or sends a second packet if the connection is restarted fast enough.
Again, the connection is not lost and the tcp/ip stack should be retransmitting automatically. The connection is not lost until the client decides it is and closes the connection.

Quote:

Is there a way to determine the ack timeout?
I don't think so. But I think it is roughly twice the round-trip time reported by ping.
Quote:

Is the tcp_retries1 that controls the number of times a packet is sent?
No, 'tcp_retries2' controls that. 'tcp_retries1 controls how many retransmissions are done before it checks to see if t should retry using a different route (i.e. send the packet out a different interface or send it to a different router).

To go further with this I think a tcpdump (filtering on just your client) and a Wireshark summary print would be helpful.

Fabio Paolini 02-21-2011 08:21 PM

tcp\ip ack timeout and packet retransmission - Solved
 
Thanks tommylovell, your reply was realy helpful. In this while I have also talked to some other people and realized some error in my code as you have said.

My server side application has a ping like packet and if there is no reply from the client it closes the connection and then when the cable is connected again the server sends FIN packet to the client closing the connection and so the packet retransmission process (I am not sure if it is exactly in such a way and it is an information that I did not know before). In that point there was a failure in my client code, which did not detect the closing connection. Now it runs nicely and I can determine roughly the period of time before the failure, through the tcp_retries2 parameter.

The main point in my error was that there are two thread, the connection thread and the other that I will call the main one. I used just one variable to communicate the connection status between these two threads, thus if the connection thread was able to change the status to offline and then get the connection again fast enough then the main thread does not feel that the system was offline. Now I isolated one variable, just to deal with that and it solved the problem

Quote:

Note that until you hit the maximum number of retries, your app knows nothing about the retransmissions. Also bear in mind that when the error is finally reported to the application, the connection is still established. The application can choose to close the connection or it can keep trying that I/O operation.
I use Python in the client side and I have a piece of code like that
Code:

if self.connected:
            ready_to_read, ready_to_write, in_error = \
                  select.select( [self.socket], [self.socket], [self.socket], 1000)

            if ready_to_read:
                self.handle_read()
            if in_error:
                self.connected = False
                state = "NOT_CONNECTED"
                self.socket.close()

      else:
            self.breakLoop=True
            self.connect()

Now I suspect that the command select.select() returns something in the in_error parameter always that the tcp/ip reports a failure in resending the packet. Is that true?

Also thanks by your little howto about the way the timeout is calculated. It explains why it is not so easy to find a parameter that defined the timeout

Now I am also able to use properly the tcp_retries2 and also the tcp_keepAlive_time, tcp_keepAlive_intvl and tcp_keepAlive_probes parameters.

tommylovell 02-21-2011 11:26 PM

I'm afraid my coding skills are very weak in general, and non-existent in Python. It's quite possible that there is an error condition being returned.

It isn't necessarily the ETIMEDOUT error. There's three errors listed in 'man tcp', and about 20 in 'man -s7 ip'. But I suppose the action you take is probably the same for any error.

But in any event, I'm really out of my league when the discussion turns to coding. Glad you've made some progress.

Fabio Paolini 03-07-2011 07:28 AM

Yes, you are right. I took the same procedure whatever the socket error was. Now I studied a little the man tcp and it is more clear.

Thanks again

virtualj 12-01-2011 11:08 AM

Sorry, I have the same problem, but I don't understand the solution.

Fabio are You Italian? I'll try to explain in both languages:
Italian:
Praticamente ho un mail server che invia posta ad un altro mail server verso internet. Il mio mail server è su una intranet molto grande e fa molti hop prima di arrivare all'uscita internet, dove incontra un firewall di frontiera. Purtroppo capita che nella intranet si perdano dei pacchetti abbastanza frequentemente. Il mio firewall di frontiera si comporta SCARTANDO tutti i pacchetti successivi a quello che si è perso perchè non è consecutivo al precedente. Il mio mail server però (con RED Hat 5) non ritrasmette MAI il pacchetto che si è perso, quindi come gli si riempie il buffer rimane in attesa di ack che non gli arriveranno mai perchè il server dall'altra parte non ha ricevuto nè il pacchetto perso nè i seguenti a causa del mio firewall di frontiera che li scarta tutti.
Le ipotesi a cui sono arrivato sono 2:
1) il firewall invia un ack fittizio del pacchetto perso e quindi il mail server continua ad inviare dati che poi vengono scartati
2) il mail server ha qualche impostazione di non reinviare i pacchetti. Non so se dipenda proprio dal sistema operativo o dall'applicazione
Entrambe le ipotesi comunque mi sembrano alquanto strane.

English:
I have a mail server that sends packet to another on the internet. For some reason a packet can be lost beetwen my server and my internet firewall. My firewall drops the packets following the lost packet because of seq number error. The problem is that my mail server never retransmit the lost packet so the connection goes in timeout.

What can be the problem?? My server runs on Red hat 5 update 7.

Thank You all in advice.

Fabio Paolini 12-08-2011 07:52 AM

Hi virtualj, I'm brazilian. I think it is easier to talk in english.
In my problem I had a socket tcp connection where my application controls the connection.
My application had a problem because the thread that controls the connection could fail to send to a second thread that a reconnection had happened. It had nothing to do with the firewall.

Nowadays I had a problem that may be similar to yours.

Some times, mainly if my server side application is running within the Windows System Operation, if the connection is lost, then my server application closes the connection (as it should be), but when the connection is recovered again my client side continues sending packets related to the first connection that had already been closed by the server side and the server never send the RST packet back to the client. I do not know if it has something to do with your problem.

I am not sure about how a mail server work.

virtualj 12-08-2011 08:04 AM

Thank you Fabio. You have an Italian name! ;-)
I think my problem is related to the remote server, but I don't know what is its OS.
There is some problem with tcp stack and my firewall drops the packets. Removing the firewall the problem disappears!!! This is the solution for now... :-(
Bye


All times are GMT -5. The time now is 05:41 PM.