I have a program that serves as both a client and a server; a server instance runs as a daemon and places a local socket in a specific directory on the file system, and a client instance connects to the socket of a particular server instance for communication. A third mode allows the user to list which servers he/she has access to. This is done by the program scanning the socket directory and attempting to connect to each socket present. Sockets that aren't active are removed and sockets that can't be connected to are ignored (that means another user ignores the socket.) This is a
setuid program, but I've removed those parts for the example.
I've isolated the problem from a much larger program for this post. This part of the program has worked perfectly on my development machine, but since I installed the program on another computer it always crashes the kernel on the
connect call. This only happens when the socket is one that can be connected to. I actually haven't tried connecting as a user that doesn't have access to a socket, but I don't want to crash my kernel mid-post. I'll try it and post an edit. Here is the code:
Code:
#include <dirent.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/un.h>
#include <errno.h>
#include <stddef.h>
#include <signal.h>
static int resolve_existing_entry(const char *nName)
{
struct sockaddr_un new_address;
size_t new_length = 0;
int new_socket = socket(PF_LOCAL, SOCK_STREAM, 0);
if (new_socket < 0) return -1;
int current_state = fcntl(new_socket, F_GETFL);
fcntl(new_socket, F_SETFL, current_state | O_NONBLOCK);
new_address.sun_family = AF_LOCAL;
strncpy(new_address.sun_path, nName, sizeof new_address.sun_path);
new_length = (offsetof(struct sockaddr_un, sun_path) + SUN_LEN(&new_address) + 1);
int connected = 0;
//the line below causes a kernel hang only if connection is possible
if (connect(new_socket, (struct sockaddr*) &new_address, new_length) < 0)
{
if (errno != EINPROGRESS && errno != EALREADY) remove(nName);
}
else connected = 1;
shutdown(new_socket, SHUT_RDWR);
struct stat current_stats;
return (stat(nName, ¤t_stats) >= 0)? ((connected)? -1 : -2) : 0;
}
static int show_table_entry(const struct dirent *eEntry)
{
if (eEntry && eEntry->d_type == DT_SOCK)
{
int connected = resolve_existing_entry(eEntry->d_name);
if (connected == -1) fprintf(stdout, "%s\n", eEntry->d_name);
return 0;
}
else if (eEntry && eEntry->d_type == DT_DIR) return 0;
else return 1;
}
int main()
{
struct dirent **entries = NULL, **current = NULL;
int total_matches = scandir(".", &entries, &show_table_entry, NULL);
return 0;
}
It is very possible that the other end of the program is causing the hang, but I haven't had a chance to isolate that part of it. I'll try to isolate a part of that code to see. Until then, please tell me if you see anything unsafe or incorrect about my example code. Thank you.
ta0kira
edit:
It appears that the hang is only when the program has permission to connect; therefore, it's probably the server end. It doesn't appear to be a problem with the
accept code, so I'm looking at a particular part where
select is used.