Python: easiest way to wait for whichever child process completes first?
I'm guiding an intern, who is working in Python. I hardly know any Python myself.
The Python program currently launches child processes and immediately waits for each to complete. But I want it to launch several and then loop: 1) wait for whichever completes first 2) launch another one They are expected to complete in a very different sequence than launched. A google search found many people asking the same question and a lot of answers implying this problem is difficult. The best I found was psutil.wait_procs, which seems to require complicated setup for an imperfect solution to the issue. Before I send the intern down that path, I'd like an opinion on whether it is the best path. BTW, we need both a Windows solution and a Linux solution and would prefer if those were as similar to each other as practical. |
Omitting the python part of your question; you want to use wait() repeatedly to get each child's exit. Contrast with waitpid() where you specify the pid you want.
There is a python discussion at http://bytes.com/topic/python/answer...t-losing-child |
You could give each child process a pipe and spot termination with select().
|
Quote:
I wonder about all the confusion I found when I did a google search for this. I guess I need to test whether that simple method works on both Windows and Linux or whether it is Linux only. |
Quote:
I need this Python program to work in both Linux and Windows. I'm pretty sure the polling method can do the same job in Windows with a bit more complexity and less efficiency, and I'll nail down those details soon. But is there a better way? |
Not a Python programmer myself, but perhaps one of the Python modules for concurrent execution is what you're looking for? The multiprocessing and subprocess modules looks quite useful for doing what you described.
|
Since I don't know much Python, I'd appreciate comments (or suggested improvement) on the program I now have working in both Linux and Windows. I needed to use polling, because I did not find a better way to wait in Windows.
Code:
import os Each process to be launched has a "cost", most are 1, but some larger. Don't worry about what cost is, but if you can't think of it as an abstract, pretend it is GB ram requirement. out is the total cost of all processes that have been launched but not completed. The processes are launched in a predetermined sequence, but may or may not be allowed to start before previous completions. A process with a cost >= max_out can only be started after all previous are finished. Other processes can be started as soon as the cost including that new one would be <= max_out n_proc[] is the list of processes running now. n_cost[] is the cost of each running process. |
I don't believe that "busy waiting" should ever be necessary, not even in Windows. Also, I have serious doubts whether this implementation would prove to be 100% reliable in a production case where processes were taking unpredictable amounts of time and might even terminate at the same time. I'd suggest looking carefully at the source-code to the concurrent-execution modules that turtlei recently spoke of, to see if there are any cross-platform gleanings to be won from it.
If this actually works for you, then of course the proper thing to do might be to "move along," but on a skeptical peer-review I would have to give this alternative a thumbs-down because I would never be convinced that it does not have timing-hole problems that would cause "vexing instability" under load. I think that a truly-reliable cross-platform solution exists, and I'll predict that the writers of the published library modules probably found it. Now – it might be different such that you really do have to write it two different ways. I don't know; I haven't looked closely. I do know that the Windows threading-model doesn't exactly follow the "father reaps" model. It's more "DEC/PDP-ish" than that. :) |
Quote:
Quote:
Quote:
I don't see anything in this polling design that might not be stable. If you have a specific failure path in mind, I would want to hear it. Or if you have simpler code, I always prefer that, because simpler is easier to trust. But your expression of general mistrust of the simple code I posted really doesn't tell me anything. I still would like to hear specific suggestions from someone who knows Python better than I do. |
I'm not a python expert, but I have written a few small programs in it.
An obvious simplification would be to del a finished process from n_proc instead of setting it to None, this would save having to check for None. Also, it's a bit funny to use enumerate but then only use the index anyway (on the other hand, you have to use the index to reference the place in n_procs, so maybe it's better to use the index everywhere for uniformity?). Code:
def wait4one(p): Code:
from collections import namedtuple Quote:
The JoinableQueue or Pool classes could be useful to wait for multiple processes. I think you would need a BoundedSemaphore to implement the cost restriction. |
Quote:
Code:
def wait4one(p): I don't understand the rules of the container involved, so I don't know what proc gets from enumerate in your code. I would expect it to be p[i] and I tried that before posting my code. It is some other kind of object so proc.poll() does not work. Still not understanding the container: I don't know what del p[i] does. But anyway it also doesn't work as you seem to intend it. |
Quote:
Here is a working version of wait4one: Code:
def wait4one(p): |
All times are GMT -5. The time now is 05:10 PM. |