LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-02-2005, 11:20 AM   #1
writejus1
LQ Newbie
 
Registered: Sep 2005
Posts: 2

Rep: Reputation: 0
Multithreaded process pausing but not deadlocking or crashing


Hi,

I am writing a largely multithreaded linux program (20-60 threads) on version Fedora Core 2. I am using glib c version 2.3.3-27. In addition, I am using the boost (boost.org) libraries (version 1.32.0) for my threading and locking.

My problem is that the process will suddenly cease activity for random lengths of time (1 sec to minutes). However, it never crashes or produces incorrect results. Also, I do not think that it is deadlocking because it always resumes its activity.

I have done some profiling of the locks, and it shows very strange behavior. For instance, threads will block for long lengths of time (the length of the inactivity) while no thread is holding the corresponding lock more than fractions of a second. When I explored this further, it appears that thread A is blocking on a mutex while thread B holds it. I am using boost::recursive_mutex::scoped_lock objects for the locking. The weird thing is that thread B pauses at the very end of the lock's scope, as though the attempt to unlock the mutex is not waking thread A and descheduling thread B for a long time.

I created a test program that spawns 30 threads that just do a bunch of locking of these boost scoped locks and yielding. This program, too, shows the same downtime activity (again without crashing or deadlocking), though less frequently (I suspect because the locking pattern is probably different than in my program).

As far as I can tell, the boost libraries don't do much more than provide wrappers for pthread functionality, so I'm not sure whether this issue is a boost problem, a kernel problem, or my problem.

I was wondering if anyone has experienced similar behavior on linux, or in using these boost libraries? If anyone could offer some insight guidance, it would be much appreciated. Thanks!

(Also, please let me know if there is a more appropriate forum for this issue).

Matt
 
Old 09-02-2005, 03:54 PM   #2
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,337

Rep: Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548
"threads will block for long lengths of time (the length of the inactivity) while no thread is holding the corresponding lock more than fractions of a second. When I explored this further, it appears that thread A is blocking on a mutex while thread B holds it. I am using boost::recursive_mutex::scoped_lock objects for the locking. The weird thing is that thread B pauses at the very end of the lock's scope, as though the attempt to unlock the mutex is not waking thread A and descheduling thread B for a long time. "

I interpret what you said to indicate that you have more than one mutex being contended for. If each thread is contending for several mutexs simultaneously you can get interlocking conditions. To ensure that you do not get interlocks which result in deadlocks you should follow one of the two following rules.

1. Any thread that locks on a mutex locks on every mutex that it needs all at the same time. This guarentees that you have no deadlocks but it can be a performance killer.

2. All threads that lock on multiple mutexs always do so in the same order. For example if several threads lock on 4 different mutexs (say a b k and j) they all lock on the mutexs in the same order ( a j b k for example).

You can also have a mixture of rules 1 and 2. You could set the rule that all threads lock on a and then later they lock on b, j, and k simultaneously.

But deadlocks are not your problem. Inexplicably long waits are your problem. I suggest that you extend your analysis of lock combinations to multiple mutexs being locked by multiple threads. While you may not be violating my two anti deadlock rules you may be holding mutexes locked longer than you need to.

------------------------------------
Steve Stites

Last edited by jailbait; 09-02-2005 at 03:56 PM.
 
Old 09-15-2005, 01:54 PM   #3
writejus1
LQ Newbie
 
Registered: Sep 2005
Posts: 2

Original Poster
Rep: Reputation: 0
In case anyone is interested, Fedora Core 2 was the problem. We switched our OS to run a Rocks cluster, and everything is works beautifully now. Quite strangely, Fedora Core 2 "forgets" about threads that want to run. If you 'ps' a seemingly stalled process using the threads option, it will "remind" the OS that the process wants to run.
 
Old 09-16-2005, 04:57 AM   #4
smurff
Member
 
Registered: Sep 2004
Location: England
Distribution: Mandriva 2005LE / Whitebox
Posts: 48

Rep: Reputation: 15
I found a similar thing on RHEL 2.1 the posix threads had issues and now upgrading to RHEL 3 everything works well.
Regards
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pausing the screen JStew Programming 8 02-27-2007 08:47 AM
Mouse pausing every few seconds mikvo Fedora 10 04-02-2005 07:10 AM
Multithreaded System On 4 Cpu Linux Machine, process stuck on certain thread eyalzm Programming 1 05-10-2004 11:46 AM
Pausing a program kamransoomro84 Linux - General 0 05-08-2004 04:25 PM
Pausing downloads KptnKrill Linux - Newbie 3 07-28-2003 09:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration