how to have fifo thread mutex in C
hello
I want to write a multi thread program that its thread have an access to a shared resource I want to have FIFO mutex on this resource I mean each thread that blocked on the mutex first then unblocked first.is this possible in threads . |
I don't think it's directly possible with pthread mutexes; however, you can fake it with some pthread barriers, conditions, and mutexes. I was curious about the problem so I came up with a working solution.
Code:
#include <stdio.h> Kevin Barry |
I believe futexes (mutexes provided by the Linux kernel) work that way by default -- the futex is granted in the same order it was requested.
You can also build a FIFO mutex using pthreads, if you really need such a type. (Why would you, really?) You'll need a normal mutex, a condition variable (to wake up all waiters via broadcast), and two counters. In C: Code:
typedef struct fifo_mutex fifo_mutex_t;
The above is not susceptible to thread starvation, although the "cost" of acquiring and releasing the mutex will increase as the number of threads blocking on it increases. (If heavily contended, it does suffer from the "thundering herd" problem, since every thread blocking on it will have to be briefly woken up to find the next thread, whenever the mutex changes state.) If uncontended, the above is just two additions "heavier" than a normal mutex. If a thread is cancelled while blocking on the mutex (waiting on the condition variable), the mutex will end up deadlocking. You can avoid that by changing the thread cancelability state temporarily, before starting to wait on the condition variable, and restoring the state when the mutex has been grabbed. There are of course totally different approaches, for example a queue or chain of threads, where each thread releasing the mutex will wake up only the next waiting thread. These tend to be much more complex to code correctly than the above, however. |
Quote:
Quote:
|
@ta0kira: No, but glibc pthread library uses futexes by default on Linux. But see the end of this post for a serious caveat.
Here's the implementation of fifo_mutex_t I was thinking about. Save this as fifo-mutex.h for example: Code:
#ifndef FIFO_MUTEX_H Code:
#include <pthread.h> The volatile int working flag is used to synchronize the creation of the threads (and the first printf in the worker, so they are completed in the order the threads are created), and thus the order in which they try to lock the fifo_mutex_t lock. As you can see in the worker code, the other printfs in the worker is done in the order the threads obtain the fifo_mutex_t lock. In other words, the printfs should be reliable indicators of the locking order. (The main thread will initially keep the lock locked, to make the race start from when it releases the lock.) Each worker thread will take the mutex LOOPS times, in a tight loop. You can see in the code that the unlock and lock are right next to each other. If the fifo_mutex_t works correctly (and the workers have enough work to do to keep at least one thread blocking on it at all times), the same order should be repeated each loop. If you define preprocessor macro USE_PTHREAD_MUTEX_INSTEAD for example using -DUSE_PTHREAD_MUTEX_INSTEAD when compiling the test program, then the fifo_mutex_t type is replaced with pthread_mutex_t (and the fifo_ functions with the pthread_ equivalents). On all of the machines I tried (various Linux and one Solaris 10), my fifo_mutex_t worked correctly: the fifo_mutex_t lock was handed to each thread in their calling order (i.e. FIFO, first in first out), even when unlocking and locking the fifo_mutex_t in a tight loop. On a Solaris 10 machine, pthread_mutex_t lock was always handed in a scrambled order. This means that you cannot rely on FIFO behaviour for pthread_mutex_t on non-Linux machines at all. On the Linux machines I tested with LOOPS=0, pthread_mutex_t lock was handed to each thread in the calling order (FIFO). However, if the same thread tries to acquire the pthread_mutex_t , i.e. LOOPS>0 in the example program above, the order gets scrambled. This means that you can rely on FIFO order for pthread_mutex_t locks on Linux only the first time a thread acquires that mutex (since the last time the mutex was free, with no threads blocking on it). I hope you find this useful, |
Quote:
Have you looked at my solution? I'm not saying it's better, but I'd like to hear your comments. I'm sure some combination of our solutions would be ideal. Kevin Barry |
Quote:
Code:
#include <stdio.h> Kevin Barry |
Quote:
Each thread hitting the pthread_cond_wait(cond,mutex) will atomically unlock the mutex and then wait on the condition variable. (The "atomic" just means it is not possible for a condition to slip in between unnoticed, as long as the condition signaler or broadcaster holds that same mutex.) When the condition variable is broadcast, each and every thread waiting on it is woken up, internally automatically re-acquiring the mutex in turn. With futex-based pthread mutexes, that will happen in FIFO order, as long as none of the threads will try to re-acquire mutex (or there are no other threads blocking on the mutex). In my implementation, each woken up thread will check if it is their turn, and if not, go back to waiting on the condition variable, again atomically releasing the mutex. Those threads that were woken up but did not get the mutex before the correct thread, will keep on blocking on the mutex until the correct thread releases it, and they get to run. They will eventually get to run, because the scheduler will give each thread some CPU time, eventually. If it is then still not their time to run, they will go back to waiting on the condition variable (again, releasing the mutex atomically). Therefore, there may be any number of threads, with some blocking on the mutex, and others waiting on the condition variable. There are no limits as far as I can see. I did run tests with 300 threads (on both 32-bit and 64-bit unicore and multicore CPUs). For more threads I'd need to set a smaller per-thread stack. Quote:
It does look interesting, especially the queue approach. Is there a specific feature I should have noticed? I dislike arbitrary limits, though. (But I do think it is possible to change the array to a dynamically grown one if necessary.) Perhaps a third scheme would work better? Use a queue (a circular buffer?) of mutexes, so that each thread blocks on the mutex belonging to its predecessor, with indices either accessed using atomic builtins or protected by a separate mutex. Each thread will first lock their own (new) mutex. If there is no predecessor mutex, the thread then owns the FIFO lock. Otherwise, the thread will lock the predecessor mutex, blocking on it. When the thread obtains the mutex, it will unlock and discard it. Then it will own the FIFO lock. To release the FIFO lock, the thread will just unlock its own mutex. At any point, there will be at most one thread blocking on each mutex. The correct thread is thus always woken up, when the holding thread releases its mutex. (The locking and unlocking order in the code is critical, to avoid the possibility of deadlock.) If the mutexes are stored in a circular buffer, the buffer cannot be reallocated while there are mutexes in it (because the kernel depends on the address when there is a thread blocked on a futex in Linux). A linked list would allow dynamic growth, but to eliminate malloc() overhead, one would have to use allocation pools; thus more complex code. This third scheme should be able to rely only on mutexes, so it should be usable wherever mutexes are. Condition variables cannot be used in signal handlers, for example. None of the pthread_mutex functions are cancellation points, so this third scheme would have no pthread cancellation points either. (Both our implementations currently have cancellation points, although by default, pthreads cancellations are deferred, so it should not matter.) Unlike my implementation, this scheme should not suffer from the thundering herd problem: each mutex is only blocked by a single thread. The "cost" should also be fixed, not depend on the number of threads blocking on the structure. I believe it would be more efficient in general than either of our implementations. |
Quote:
Quote:
Kevin Barry |
Quote:
Quote:
You might as well just add this to the lock/unlock functions in your first solution and call it good: Code:
//first line of function --> PS This might be nitpicking, but I'm wondering about idiosyncrasies related to possible integer overflow in the indexing used in your first solution. I haven't used an architecture that hasn't wrapped in such cases, however, and in fact I don't even know of one. |
Quote:
Quote:
On the other hand, I've personally never needed thread cancellation at all; my solutions tend to use thread pools that do a bit of work, and then either wait for new work or exit. In some cases it is useful to have many more threads than available CPU cores, in which case a FIFO mutex would likely be heavily contended. The suggested scheme should have all the benefits of my original solution, but not suffer from the "thundering herd" problem (where all threads must be briefly woken up whenever the mutex changes). On the gripping hand, I really cannot imagine a real need for a FIFO mutex either. I suspect only Windows programmers might find it useful.. but I may be wrong. Quote:
Code:
struct mutex_list { Quote:
Quote:
On the other hand, if you use a circular buffer and atomic built-ins to modify the indices, you must use either a power of two buffer size (so that the index values can just be ANDed with (size-1) to get the actual indices), or do a load-adjust-compare-swap in a loop using e.g. __sync_bool_compare_and_swap(). It will still be faster than using a mutex. (I once wrote an atomic addition function for double-precision floating point around __sync_bool_compare_and_swap(), for testing purposes. It has surprisingly little overhead on most x86 CPUs I tested. Measurable, but less than using a mutex.) |
Quote:
Quote:
Quote:
Code:
#include <stdio.h> Kevin Barry |
Quote:
The idea is that the allocation "pool" is actually a chain of pools. Whenever an item is allocated, the first pool in the chain with free items is used. If there are no free items in the chain at all, an empty pool is prepended to the list. Whenever an item is deallocated, the pool is also pushed down the chain (unless it is the first or last pool in the chain). That will push sparse pools down the chain, and we should end up with mostly full pools with a few sparse ones. ... but some test coding indicates the above is not worth the effort. It gets pretty complex fast, but is not faster than malloc()/free(). I think it is more efficient to keep a separate linked list to reuse recently released items, with all items individually malloc()ed. When there are no items to reuse, just malloc() a new one. When releasing, just move the item to the separate list. After every N releases, one could free() excess elements in the separate list; note that N is also the maximum number of additional items the separate list could have. The code is very similar to your latest example, except with the reuse/unused list added; very little added complexity. One could even let N vary based on contention (just add suitable counters to the structure): with low contention, the unused list would be kept small, but larger when there is more contention. I guess that would pretty much eliminate the malloc()/free() overhead, because in most situations the items would be just recycled endlessly, without any malloc()/free() calls. (malloc()/free() would get called only for "bursty" contention, i.e. whenever the contention on the lock changes.) |
Here is an example implementation I described in my post above. I'm not sure it is even interesting.
Code:
#ifndef FIFO_MUTEX_H |
You've certainly put a lot of work into something you don't need! Over all it seems like a good design. I wouldn't really call it a fifo mutex, but rather an access queue because of the growth and maintenance associated with larger numbers waiting threads. As such, I'm curious about the possibility of making it a priority queue. That would probably result in non-constant access time due to resorting, though. Your solution is right on one of those lines where it's either a robust version of something primitive (i.e. fancy mutex) or it's a primitive version of what's normally more robust (i.e. primitive queuing system), so it seems natural to want to promote it to the higher domain and bog it down with features.
One portability problem is (pthread_t)-1. pthread_t doesn't have to be an integral type, so this might not compile on all OSes. I know I couldn't compile a cast from pthread_t to unsigned int on FreeBSD (for informational output); however, I don't have FreeBSD available at the moment and it appears online references show pthread_t as a pointer in FreeBSD. To get rid of all the warnings and errors I ended up with *(unsigned int*) (void*) &the_pthread_t. It occurred to me earlier that the "cars entering an intersection" analogy for mutex usage is also a good example of when a fifo mutex would be needed. Whenever I've seen the cars/intersection example there was at most 1 car waiting to enter the intersection. If there were 1-4 cars waiting in real life, however, they should proceed in the order they arrived. Kevin Barry |
All times are GMT -5. The time now is 09:22 AM. |