The so-called "flaming arrow" approach ... where one thread is created for each request, and it flies up into the air, processes the request, and then flames-out ... is fine for
very light workloads, but do be aware that it will crumple
badly under more serious loads.
Suppose you suddenly get deluged with 1,000 requests per second by someone who's trying to do a denial-of-service attack. His attack will succeed, because suddenly the system dispatcher has 1,000 new threads to contend with, and all of them are fighting for the same resources and taking the same locks and so-on. Basically they're all getting in each other's way... just as a thousand hungry people would if they all raced into the kitchen.
A different approach would have a
pool of worker-threads who are waiting on a
single queue of incoming requests. The workers dequeue a request, process it, then "clean up, change clothes, and take a quick shower," and go back to waiting for the next request to arrive.
The worker-threads don't die. There's usually a "monitor" thread that periodically checks to make sure that the workers are still alive and that they're not somehow "stuck" on a particular request. Yet another thread might do nothing but watch the monitor.
There is a single input-thread which is running the
accept() loop previously described, but it is then placing those incoming requests on a queue. (And, probably, observing the size of that queue so that it is not permitted to grow too large... some incoming requests might just have to be refused.)
There might be yet another thread whose job is to finish-up the requests, to do statistical logging and so-forth.
At any moment in time, then, only a fixed maximum number of requests will be "in process" at any one time. This enables the throughput of the system to be predictable... even with a flood of 1,000 requests, only (say) 20 requests will be active and so,
regardless of the queue-size at the moment, we can say that "this system has a worst-case throughput of (say) 200 requests/sec," and we know that the backlog will clear out within five seconds.
There is, you see, a rather infamous characteristic of computer systems: the "knee-shaped curve" or "hitting the wall."
(Smack!) Performance degrades more-or-less linearly up to a certain point, where it abruptly becomes exponential(ly bad). If you maintain a throttle upon the amount of work that the system
attempts to carry out simultaneously, you never hit that wall. Queues build up, but the work keeps moving.
A "request," in such a system, isn't a thread or a process or anything known to the system dispatcher. It is a
thing; an
object.
Some systems are even more sophisticated, with a certain number of "job analysis" threads taking the initial request, deciding what stage(s) need to be performed to complete it, and then brokering out the stages to those threads .. or even machines .. that are dedicated to each stage. (What I'm now describing is a
transaction-processing monitor. These are available off-the-shelf. I'm being loose with my terminology here.) Sounds fancy, but you see it being done at any fast-food restaurant, where we've got "the fry guy" and "the drink guy" and "mister burger-man" and so on.