-   Linux - Networking (
-   -   issues with network communication in kernel (

aasthakm 03-17-2013 01:20 PM

issues with network communication in kernel
I am trying to use a kernel socket to send/receive data with a server over TCP. I have used the following APIs for the purpose - sock_create, sock_sendmsg, sock_recvmsg and sock_release. As part of sock_sendmsg I pass MSG_NOSIGNAL and MSG_DONTWAIT flags and everything seems to be working fine with sending buffers. However, receiving has some problems. Below is the BUG trace I get when sock_recvmsg is called in my thread.


1364 [  157.335139] BUG: scheduling while atomic:
1367 [  157.335187] Call Trace:
1369 [  157.335205]  [<ffffffff81586a08>] dump_stack+0x69/0x6f
1370 [  157.335210]  [<ffffffff8159e02f>] thread_return+0x325/0x356
1371 [  157.335215]  [<ffffffff8159e80d>] schedule_timeout+0x2bd/0x340
1372 [  157.335221]  [<ffffffff81470711>] sk_wait_data+0xd1/0xe0
1373 [  157.335236]  [<ffffffff814c4056>] tcp_recvmsg+0x526/0xb50
1374 [  157.335246]  [<ffffffff814e692a>] inet_recvmsg+0x9a/0xd0
1375 [  157.335252]  [<ffffffff8146b397>] sock_recvmsg+0x127/0x140
1376 [  157.335258]  [<ffffffffa00e6405>] ts_recv_buffer+0x85/0xf0 [tsdd]
1380 [  157.335281]  [<ffffffffa00e65a8>] tsdd_worker_thread+0x138/0x1b0 [tsdd]
1381 [  157.335286]  [<ffffffff81075eee>] kthread+0x7e/0x90

First I thought it might have something to do with MSG_DONTWAIT flag too, but setting or not setting it doesn't make any difference. Despite the dump, I still see that the receive operation succeeds and the buffer returned contains valid data. Looking at the tcp_recvmsg() code in the kernel, it seems that sk_wait_data would be called fairly often.

So I don't understand what the exact problem is here. And how do I fix it? In general, how do I avoid the problem that the thread blocks indefinitely for the network I/O and yet also not hit the problem above.


smallpond 03-17-2013 06:20 PM

"scheduling while atomic" means that you are calling a function that can sleep while you are in atomic context: an interrupt routine or holding a lock, for example. Receive of a TCP buffer may take multiple ethernet packets, so the code needs to be able to wait for resources.

aasthakm 03-18-2013 05:51 AM

Thank you for the response. I understood what you said. Now the question is, what is the source for the seen behaviour. The tcp_recvmsg is a common code path that will be used by every one using kernel sockets for TCP connection, I believe. So, if there is no problem in the tcp path, I would like to understand what needs to be fixed in my code. Please have a look at my code sketch:


struct socket *set_up_client_socket(ip_addr, port)
  struct socket *cl_sock;
  struct sockaddr_in sin;
  int error;

  error = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP, &cl_sock);
  if (error < 0) {
    return NULL;
  /* initialize sin */
  error = cl_sock->ops->connect(cl_sock, (struct sockaddr*)&sin, sizeof(sin), O_NONBLOCK);
  /* error checking and retrying */
  return cl_sock;

long int ts_recv_buffer(sock, buf, len)
  struct msghdr msg;
  struct iovec iov;

  long int len2;
  mm_segment_t oldfs;

  iov.iov_base = (void *)&buf[0];
  iov.iov_len = len;

  /* initializing name, len, etc. fields of msg */
  msg.msg_iov = &iov;
  msg.msg_iovlen = 1;
  msg.msg_flags = MSG_DONTWAIT;  //msg_flags = 0 also has the same result

  oldfs = get_fs();

  len2 = sock_recvmsg(sock, &msg, len, 0);
  return len2;

Within this path, do I need to explicitly call preempt_enable or some such thing to avoid the bug I am seeing?
Thanks again, for your help.

smallpond 03-20-2013 11:18 PM

Generally not good form to pass the address of stack variables; you should kmalloc msg.

ts_recv_buffer is called from your kernel thread. How is the thread being called? Somehow it is in atomic contxt.

All times are GMT -5. The time now is 10:56 AM.