[SOLVED] Pthread: a lot of spurious wakeups on Linux

glukoz.ziewa · 11-28-2011, 07:07 AM

Hello all,

I am experience really annoying issue with pthread conditional variable - a lot of spurious wakeups. First I will present code scheme:

Code:

void *thread_func(void *ptr)
{
   /* Some initializations here... */

   while (1) {

      /* Some thread work here ... */

      pthread_mutex_lock(&mutex);

      if (!more_work_to_do) {

         print("There is no more work to do at this moment, going to sleep for 2 s.");
         pthread_cond_timedwait(&cond, &mutex, &ts);

      }

      pthread_mutex_unlock(&mutex);
   }

}

I do not use any predicate to protect code against spurious wakeups, because application logic does not require it: in "Some thread work here" section thread can check if there is really something to do, and in case there is not, it is expected to simply wait on conditional variable again. I found it as good solution, but only at the condition that spurious wakeups are rare. My problem is that they are not.

If I redirect program output to file and let it run for 30s, then get file which is 500k lines long with majority of lines "There is no more work to do at this moment, going to sleep for 2 s.". I added time stamp to print function (displays number of ms elapsed since program start) and this line appears many, many, many times every ms. So it seems that pthread_cond_timedwait() cannot block because of some reason. Actually it blocks for 2 s. from time to time, but as You may guess from output file size, it is rare event.

I am 100% sure that it is not caused by some broken pthread_signal() calls in main thread, because it happens even when I force main thread not call this function. I run program on Linux machine, I tried to block all signals in thread function by adding this code in "Some initializations here" section:

Code:

sigfillset(&sigset);
pthread_sigmask(SIG_SETMASK, &sigset, NULL);

but it did not help. I am just wondering what makes pthread_cond_timedwait() function unable to block. I would like to ask You for suggestions how to solve this issue.

sundialsvcs · 11-28-2011, 07:44 AM

You want to get rid of that polling. In other words, you don't want to "sleep for two seconds."

Define two variables:

An atomic counter of how many units of work remain, or a queue of work to do, protected by a mutex.
A condition-variable which indicates that there might be new work to do.

When new work is added to the queue (protected by the mutex), the condition-variable is asserted, thus waking up anyone who might be waiting for it. Then, the mutex is released.

The receiving thread waits on the condition-variable, thereby resetting it. Then, it loops to fetch elements from the queue (protected again by the mutex), until the queue is empty. Then, it returns to the outer loop where it waits upon the condition-variable again.

It is entirely likely that this thread will wait upon the condition, find that it is already asserted (thus returning immediately without waiting), and then find that the queue is empty. In this case it will simply go right back to sleep again, having done nothing during this particular cycle. This "error on the side of caution" is done, partly just for programmer convenience, and partly to make sure that the thread does not go into an indefinite wait with actual work pending on the queue. It is for the same reason that I suggest that the enqueueing routine should assert the condition-variable every time it adds something to the queue. As long as you correctly avoid "polling," you don't care if there are some extra wakeups. You do care very much if the thread doesn't wake up.

Remember also that there are many existing examples of this kind of logic. Don't reinvent any wheels.

glukoz.ziewa · 11-28-2011, 09:08 AM

Thank You for your reply.

Quote:

You want to get rid of that polling. In other words, you don't want to "sleep for two seconds."

I see that my post was not fully informative. Actually I need that polling: thread function is expected to wake up when some work to do came from main thread, but there are also some tasks which it must perform every 2 s. Anyway, I am not sure if my code scheme is clear: my code is not expected to poll for tasks in infinite, non-blocking loop. pthread_cond_timedwait is expected to block thread (thus releasing cpu), and continue when there is signal from main thread OR 2 sec elapse. My code is expected to work exactly as You described with one difference: even if there is no work from main thread, worker thread has to wake up every 2 sec to perform some periodic tasks. My problem is that pthread_cond_timedwait does not behave as excepted: it doesn't want to block for 2 sec - it returns immediately so thread eats cpu. Using non-timed waiting on condition variable wouldn't solve my problem if pthread_cond_wait function didn't want to block as well - thread wouldn't block and that is why it would poll for task in non-blocking, infinite loop, thus eating cpu.

dwhitney67 · 11-28-2011, 10:17 AM

Quote:

Originally Posted by glukoz.ziewa

Thank You for your reply.

I see that my post was not fully informative. Actually I need that polling: thread function is expected to wake up when some work to do came from main thread, but there are also some tasks which it must perform every 2 s. Anyway, I am not sure if my code scheme is clear: my code is not expected to poll for tasks in infinite, non-blocking loop. pthread_cond_timedwait is expected to block thread (thus releasing cpu), and continue when there is signal from main thread OR 2 sec elapse. My code is expected to work exactly as You described with one difference: even if there is no work from main thread, worker thread has to wake up every 2 sec to perform some periodic tasks. My problem is that pthread_cond_timedwait does not behave as excepted: it doesn't want to block for 2 sec - it returns immediately so thread eats cpu. Using non-timed waiting on condition variable wouldn't solve my problem if pthread_cond_wait function didn't want to block as well - thread wouldn't block and that is why it would poll for task in non-blocking, infinite loop, thus eating cpu.

How are you setting up the timespec 'ts' variable? Is it set up within the while-loop, or before entering the loop?

glukoz.ziewa · 11-30-2011, 02:40 AM

Quote:

How are you setting up the timespec 'ts' variable? Is it set up within the while-loop, or before entering the loop?

Here is updated scheme which answers Your question:

Code:

void *thread_func(void *ptr)
{
   /* Some initializations here... */

   while (1) {

      /* Some thread work here ... */

      pthread_mutex_lock(&mutex);

      if (!more_work_to_do) {

         print("There is no more work to do at this moment, going to sleep for 2 s.");

         ts = get_wait_time() //it returns 'now' + 2s
         pthread_cond_timedwait(&cond, &mutex, &ts);

      }

      pthread_mutex_unlock(&mutex);
   }

}

glukoz.ziewa · 11-30-2011, 04:06 AM

Ok, I found reason of my problem. pthread_cond_timed() function didn't want to block (and returned EINVAL error immediately) when tv_nsec field in timespec structure was greater or equal to 1 000 000 000. It seems that such event is found as logical error. All I had to do was adding following code:

Code:

while (ts.tv_nsec >= 1000 * 1000 * 1000) {
   ts.tv_sec++;
   ts.tv_nsec -= 1000 * 1000 * 1000;
}

And now blocking works correctly.

dwhitney67 · 11-30-2011, 05:51 AM

Alternatively,

Code:

#include <time.h>
...
struct timespec ts;
...
clock_gettime(CLOCK_REALTIME, &ts);

ts.tv_sec += 2;   // 2-seconds from "now"

clock_gettime() requires linking with the "real-time" library, or librt.so. This library also pulls in libpthread.so.

glukoz.ziewa · 11-30-2011, 06:58 AM

Quote:

clock_gettime() requires linking with the "real-time" library, or librt.so. This library also pulls in libpthread.so.

In fact, I used '2s' in my scheme for some simplicity. In real code time interval I use is configured by macro and is given in ms. It does not have to be aligned to second boundary. Now I know that one should always present problem precisely because issue may lie in really tiny detail.

sundialsvcs · 11-30-2011, 10:46 PM

The easiest way to handle that is to set up a "watchdog" thread that goes to sleep for two seconds, (say...) increments a count of the number of timer-pops that have occurred so far, and then asserts the condition variable to make sure that somebody out there is awake.

Or the timer-dedicated thread might actually do the work. It would wake up, grab any mutexes needed to synchronize itself with what everyone else out there is doing, do the work and go back to sleep again. (Until it notices that some flag has been set that says: "we're shutting-down now, so everybody please die.")