detecting closed socket while buffer is non-empty

ta0kira · 12-21-2008, 03:42 AM

I have a program that performs operations on network sockets with two separate threads: one reads data and one writes data. The read data is processed in a non-trivial manner and single written transmissions don't necessarily directly correspond to single read transmissions.

Both read and write operations are non-blocking. Both retry reading/writing a limited number of times, then give up. The problem I'm having is due to a program on the other end of a socket saturating the buffer with data, then timing out and exiting (unfriendly test program.) Meanwhile, the program in question is processing data from the buffer which will often result in a return transmission. The write thread attempts to deliver the return transmissions but fails to detect the closed write half of the duplex. What should ideally happen is that the write system call should return -1 with an errno of other than EAGAIN or EINTR (my criteria for not calling the connection closed,) then the program would shut down the read half and ignore the rest of the buffered input data. What instead happens is that the write thread retries every destined-to-fail transmission for Ns, then moves on to the next one. What's important is that the program successfully detects a closed connection almost instantaneously when the buffer contains no data (this detection happens in the read thread at a select call, however, so the write thread has no chance to detect it.)

The current system will increment an error counter upon a failed transmission and will disconnect the socket once a limit is reached. This also happens with reading, but unfortunately in this situation each read from the buffer decrements the counter as a positive indication of validity. For this reason, the socket is never disconnected due to write errors. Decrementing the counter is necessary for error recovery if the other end is temporarily overloaded.

How does one detect a closed write duplex of a network socket? Thanks.
ta0kira

PS The far end always calls shutdown for both read and write duplexes and I've verified that the far end has indeed exited.

jiml8 · 12-22-2008, 03:48 PM

if you use send/recv rather than write/read, you can set flags on your I/O. You could set a MSG_NOWAIT on the send to make it non-blocking then read the result to see if anything was sent. If nothing was sent, set a flag. You might want to try a couple of times, but if you can't communicate, just close the recv socket and purge the buffer.

See include/bits/socket.h.

I think you need to watch for the ECONNRESET or ENOTCONN error flags. You might also investigate the EPIPE error return.

ta0kira · 12-23-2008, 12:37 AM

Thanks for the info. Originally I checked for EPIPE to detect a closed pipe/socket, then EINTR and EAGAIN would cause a retry. Then I decided that anything besides EINTR and EAGAIN would indicate something not worth retrying, so I got rid of the EPIPE check.

I would consider using send, but the writing/reading is integrated into exportation and importation interfaces that are used for both sockets and pipes. Basically, the system transmits structured data by passing the output interface to each node of a binary tree, then parsing takes place in a complimentary manner.

I took another look and I might have performed a flawed check, but it's difficult to tell if the problem exists still because I've improved the input and processing so that a buffer backup lasts less than a second. That, and it's a difficult situation to duplicate. I suppose I could set up two test programs, one that writes to a socket until it gets a SIGPIPE and exits, and another on the other end that waits 10s or so, then tries to write back the other way. I'll post more if I still have the problem. Thanks.
ta0kira

dwhitney67 · 12-23-2008, 05:15 AM

Quote:

Originally Posted by ta0kira

I have a program that performs operations on network sockets with two separate threads: one reads data and one writes data. The read data is processed in a non-trivial manner and single written transmissions don't necessarily directly correspond to single read transmissions.

Both read and write operations are non-blocking. Both retry reading/writing a limited number of times, then give up. The problem I'm having is due to a program on the other end of a socket saturating the buffer with data, then timing out and exiting (unfriendly test program.) ...

The description of your application seems very similar to an application I worked on nearly a decade ago. The solution that I employed was to create a separate thread to do the data processing, thus allowing the read and write threads to do nothing more than a very trivial task. These threads were able to pass messages around using thread-safe queues.

Code:

socket ---->  read-thread ----> Q ----> processing-thread ----> Q ----> write-thread ----> socket

Anyhow, this approach alleviated the load on the TCP socket's internal queue, and shifted the weight (i.e. burden) to the queues managed by the application.

ta0kira · 12-31-2008, 05:56 PM

Actually, the processing involves parsing and forwarding across a pipe to another program. The program in question parses returned data with a third thread and adds it to a queue, then the output thread extracts those queued messages and sends them:

Code:

          /--> [input thread]
          |      - parse from socket
          |      - forward >------------------------\
socket ~==+                                         |
          |                                         |
          |                                         |
          |--< [output thread]                      |
                 - take from queue <--\       [other program]
                 - send to socket     |             |
                                      |             |
                  /-------------------/             |
                  |                                 |
               [queue] <---------\                  |
                                 |                  |
                                 |                  |
               [message thread]  |                  |
                 - receive <----)|(-----------------/
                 - queue >-------/

ta0kira

jiml8 · 12-31-2008, 06:07 PM

Curiously enough, since my last post on this thread I myself posted a thread on this board about a problem I was having where a server of mine was terminating without returning from send() when it tried to send data down a socket that had been disconnected from the other end.

The discussion was quick and brisk, and what I learned from it was that a SIGPIPE was thrown by the system when I tried to do that and, if unhandled, the SIGPIPE was causing my process to terminate.

I dealt with the issue by writing a signal handler and catching SIGPIPE - then doing nothing with it. The discussion occurred here:
http://www.linuxquestions.org/questi...closed-693703/

It happens that this solves your problem too. You do the send() or a read() and get the SIGPIPE. Also, because you are catching the signal, your send() (or write()) returns with an error, and does not terminate the process. Your event handler for the SIGPIPE sets a flag.

Your process that is sending data can then be terminated if you wish by processing the error response from the send(), or not...it doesn't matter.

Your process that is reading data checks the flag your signal handler set when it got the SIGPIPE before it tries to read data. Thus you know that the connection is broken and you can decide what to do.

wje_lq · 12-31-2008, 06:43 PM

Quote:

Your process that is reading data checks the flag your signal handler set when it got the SIGPIPE before it tries to read data. Thus you know that the connection is broken and you can decide what to do.

Almost. It doesn't cover the situation where the other end closes the connection between the time the current program did its most recent write and a new attempt to read.

Ordinarily one would detect it on the read()/recv() by checking the result; if it's 0, then the connection has closed.

But ta0kira is doing nonblocking I/O. That complicates it only a little. He should do a read()/recv() only when a preceding select() turns on the read bit; after that, if the result of the read()/recv() is 0, then you know the connection has been closed from the other side.

ta0kira · 12-31-2008, 09:09 PM

Quote:

Originally Posted by jiml8

It happens that this solves your problem too. You do the send() or a read() and get the SIGPIPE. Also, because you are catching the signal, your send() (or write()) returns with an error, and does not terminate the process. Your event handler for the SIGPIPE sets a flag.

I actually have a SIGPIPE handler that logs the signal and returns, but I'm not sure if EINTR overrides EPIPE as a result of the broken write, so I might just be getting a continual EINTR, which perpetuates the loop.

Quote:

Originally Posted by wje_lq

But ta0kira is doing nonblocking I/O. That complicates it only a little. He should do a read()/recv() only when a preceding select() turns on the read bit; after that, if the result of the read()/recv() is 0, then you know the connection has been closed from the other side.

The problem with this is that reading still happens because the buffer isn't empty, and while that happens write (wasn't) indicating failure. As I said a few posts ago, though, I'm second-guessing that this was the case and haven't be able to check it because the circumstances behind it have changed.
ta0kira

jiml8 · 12-31-2008, 09:21 PM

I am not sure what the issue here is. If you do a perror(send) after the signal is thrown, you'll get a "Broken Pipe" error. The fact that you got the SIGPIPE signal to begin with says that the connection is broken.

wje_lq · 12-31-2008, 11:32 PM

Quote:

I am not sure what the issue here is.

There are probably no remaining original issues. I think we're all on the same page with that.

To clarify my most recent remark, though, ta0kira said:

Quote:

Originally Posted by wje

But ta0kira is doing nonblocking I/O. That complicates it only a little. He should do a read()/recv() only when a preceding select() turns on the read bit; after that, if the result of the read()/recv() is 0, then you know the connection has been closed from the other side.

The problem with this is that reading still happens because the buffer isn't empty, and while that happens write (wasn't) indicating failure.

And ta0kira is correct. My remark was not addressed to his original problem, but to this by jiml8:

Quote:

Your process that is reading data checks the flag your signal handler set when it got the SIGPIPE before it tries to read data. Thus you know that the connection is broken and you can decide what to do.

I had seen that as a general recommendation to determine when not to read(), and was observing that this won't catch all cases for a potential read(); you'll also have to go through the select()/read() dance to be sure, at least with nonblocking I/O.

Hope I've confused things enough. :)

jiml8 · 01-01-2009, 02:54 AM

Quote:

The problem with this is that reading still happens because the buffer isn't empty, and while that happens write (wasn't) indicating failure.

And ta0kira is correct. My remark was not addressed to his original problem, but to this by jiml8:

I missed this originally. Are you saying that the SIGPIPE won't be thrown on a write() or send() IF there is data waiting in the receive queue????

Quote:

I had seen that as a general recommendation to determine when not to read(), and was observing that this won't catch all cases for a potential read();

I had not meant it to be a completely general recommendation, but it would appear appropriate for the case that I understood to apply here. If an extra read occurred because the flag was set during that race time, it wouldn't really matter.

wje_lq · 01-01-2009, 04:43 AM

Quote:

Are you saying that the SIGPIPE won't be thrown on a write() or send() IF there is data waiting in the receive queue????

No. If the answer were "yes", that would be really bizarre. In other words, this question deserves every one of the question marks you gave it.

I'm saying that if you're about to read(), the only way to test whether the read() fails because of closed connection is to go ahead and do the read and test the result for 0. And, if you're doing nonblocking I/O, don't proceed to the read() until the select() turns on the read bit; otherwise, a 0 result could simply mean that no bytes are available.

I had (perhaps mistakenly) taken this to assert otherwise:

Quote:

Your process that is reading data checks the flag your signal handler set when it got the SIGPIPE before it tries to read data. Thus you know that the connection is broken and you can decide what to do.

I believe we're all on the same page now.

ta0kira · 01-01-2009, 04:33 PM

Quote:

Originally Posted by wje_lq

I'm saying that if you're about to read(), the only way to test whether the read() fails because of closed connection is to go ahead and do the read and test the result for 0. And, if you're doing nonblocking I/O, don't proceed to the read() until the select() turns on the read bit; otherwise, a 0 result could simply mean that no bytes are available.

That is incorrect. You will get (ssize_t) -1 with errno == EAGAIN. The important part is to not check for < 0 because ssize_t is unsigned.
ta0kira

wje_lq · 01-01-2009, 06:00 PM

Quote:

That is incorrect. You will get (ssize_t) -1 with errno == EAGAIN.

At first I was going to say, "Wait, what?" But then I realized you're talking about the result from the read() where you don't have a prior notification from select() that data (or end of data) is available, and you're absolutely right, and I'm wrong.

There are two neat things about preceding the read() with a select() which waits for the read bit to get set, whether or not you're expecting input.

You can have a thread, if you wish, which simply waits for the connection to get closed, and then does some work based on that. It won't need to wait until you actually want to write to the socket. Useful if your need for status can be more often than your writes.
If you're mixing reads and writes in the same event loop, following the read bit from the select() to find a zero-length input is a very easy way to detect end of connection.

It's a matter of what code you already have (if it works for you) and what you need and what your style is.

But I beg to differ on the "important part":

Quote:

The important part is to not check for < 0 because ssize_t is unsigned.

You'll note that the third parameter passed to read() is of type size_t, and the type of the function's returned value is ssize_t. These are not the same. ssize_t is a signed size_t, and is signed specifically so you can check for error return from the read().

But don't take my word for it. Run this bash script:

Code:

#!/bin/bash

cat > thursday.c <<EOD
#include <sys/types.h>
#include <stdio.h>

int main(void)
{
  size_t  size_1;
  size_t  size_2;

  ssize_t ssize_1;
  ssize_t ssize_2;

  size_1=0;
  size_2=-1;

  if(size_2<size_1)
  {
    printf("size_t is signed\n");
  }
  else
  {
    printf("size_t is unsigned\n");
  }

  ssize_1=0;
  ssize_2=-1;

  if(ssize_2<ssize_1)
  {
    printf("ssize_t is signed\n");
  }
  else
  {
    printf("ssize_t is unsigned\n");
  }

  return 0;

} /* main() */
EOD

gcc thursday.c -o thursday
./thursday

When I did, I got this output:

Code:

size_t is unsigned
ssize_t is signed