Multithreaded process pausing but not deadlocking or crashing
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Multithreaded process pausing but not deadlocking or crashing
Hi,
I am writing a largely multithreaded linux program (20-60 threads) on version Fedora Core 2. I am using glib c version 2.3.3-27. In addition, I am using the boost (boost.org) libraries (version 1.32.0) for my threading and locking.
My problem is that the process will suddenly cease activity for random lengths of time (1 sec to minutes). However, it never crashes or produces incorrect results. Also, I do not think that it is deadlocking because it always resumes its activity.
I have done some profiling of the locks, and it shows very strange behavior. For instance, threads will block for long lengths of time (the length of the inactivity) while no thread is holding the corresponding lock more than fractions of a second. When I explored this further, it appears that thread A is blocking on a mutex while thread B holds it. I am using boost::recursive_mutex::scoped_lock objects for the locking. The weird thing is that thread B pauses at the very end of the lock's scope, as though the attempt to unlock the mutex is not waking thread A and descheduling thread B for a long time.
I created a test program that spawns 30 threads that just do a bunch of locking of these boost scoped locks and yielding. This program, too, shows the same downtime activity (again without crashing or deadlocking), though less frequently (I suspect because the locking pattern is probably different than in my program).
As far as I can tell, the boost libraries don't do much more than provide wrappers for pthread functionality, so I'm not sure whether this issue is a boost problem, a kernel problem, or my problem.
I was wondering if anyone has experienced similar behavior on linux, or in using these boost libraries? If anyone could offer some insight guidance, it would be much appreciated. Thanks!
(Also, please let me know if there is a more appropriate forum for this issue).
"threads will block for long lengths of time (the length of the inactivity) while no thread is holding the corresponding lock more than fractions of a second. When I explored this further, it appears that thread A is blocking on a mutex while thread B holds it. I am using boost::recursive_mutex::scoped_lock objects for the locking. The weird thing is that thread B pauses at the very end of the lock's scope, as though the attempt to unlock the mutex is not waking thread A and descheduling thread B for a long time. "
I interpret what you said to indicate that you have more than one mutex being contended for. If each thread is contending for several mutexs simultaneously you can get interlocking conditions. To ensure that you do not get interlocks which result in deadlocks you should follow one of the two following rules.
1. Any thread that locks on a mutex locks on every mutex that it needs all at the same time. This guarentees that you have no deadlocks but it can be a performance killer.
2. All threads that lock on multiple mutexs always do so in the same order. For example if several threads lock on 4 different mutexs (say a b k and j) they all lock on the mutexs in the same order ( a j b k for example).
You can also have a mixture of rules 1 and 2. You could set the rule that all threads lock on a and then later they lock on b, j, and k simultaneously.
But deadlocks are not your problem. Inexplicably long waits are your problem. I suggest that you extend your analysis of lock combinations to multiple mutexs being locked by multiple threads. While you may not be violating my two anti deadlock rules you may be holding mutexes locked longer than you need to.
In case anyone is interested, Fedora Core 2 was the problem. We switched our OS to run a Rocks cluster, and everything is works beautifully now. Quite strangely, Fedora Core 2 "forgets" about threads that want to run. If you 'ps' a seemingly stalled process using the threads option, it will "remind" the OS that the process wants to run.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.