ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
You've certainly put a lot of work into something you don't need!
I'm addicted to solving problems.
Quote:
Originally Posted by ta0kira
I wouldn't really call it a fifo mutex, but rather an access queue because of the growth and maintenance associated with larger numbers waiting threads.
You're absolutely right.
Quote:
Originally Posted by ta0kira
As such, I'm curious about the possibility of making it a priority queue.
Perhaps, but I've never needed one myself; I'd have to see some real life use cases first. Certainly the reordering of the list would be feasible.
You might have to add a flag in the list item to denote defunct/errored/dead/etc. threads, so the cleanup does not throw away active items (in cases where the list gets reordered).
Quote:
Originally Posted by ta0kira
Your solution is right on one of those lines where it's either a robust version of something primitive (i.e. fancy mutex) or it's a primitive version of what's normally more robust (i.e. primitive queuing system), so it seems natural to want to promote it to the higher domain and bog it down with features.
Exactly. That is why I'm not sure this implementation is useful in either domain. Especially since in Linux, futexes (default pthread mutexes) provide mostly-same behaviour. Personally, I'd go as far as say the futexes yield more useful patterns. (While the same thread is likely to get the futex, if it unlocks and then re-locks it soon afterwards, I consider that a logic fault; one needs a condition variable or other programming pattern there, to make sure thread starvation does not occur.)
Quote:
Originally Posted by ta0kira
One portability problem is (pthread_t)-1
Good catch, I didn't think of checking that.
Fortunately, it is actually a remnant of some testing (logic checking) I did. It is perfectly okay to drop the assignments altogether, because the active pointer is NULL whenever the owner field is invalid. In other words, when active is NULL, and it is NULL whenever owner==(pthread_t)-1 , the value of the owner field is irrelevant. So, dropping the assignments will not affect program flow at all.
Quote:
Originally Posted by ta0kira
It occurred to me earlier that the "cars entering an intersection" analogy for mutex usage is also a good example of when a fifo mutex would be needed. Whenever I've seen the cars/intersection example there was at most 1 car waiting to enter the intersection. If there were 1-4 cars waiting in real life, however, they should proceed in the order they arrived.
I hope you don't mind me hijacking the analogy, but I'd argue the threads themselves should be the drivers, and the cars should be the work payload. In such an intersection, it would be better for the drivers to step out, have a break or a smoke, with drivers going into random cars -- but the cars starting off always in the correct order.
I'm of course talking about thread pools.
Instead of trying to order the threads at such points in code, you can queue the work instead, and use a normal mutex to control the access to the queue. (Or even better, use __sync_bool_compare_and_swap() to implement a "take next piece of work if there is any" locklessly.) It does not matter which thread gets the mutex, since they will do/drive/run with the next piece of work anyway.
I do realize that switching from queuing threads to queuing the work the threads do, is a significant change in program logic. There are side benefits, however. One is the possibility of dynamically resizing the driver (thread) pool size. If you have threads doing a long chain of events, you do not need a thread pool; at such points the threads may simply swap work loads, if the point is congested. But work order is guaranteed. Another is better data encapsulation -- it is no longer about the thread that does the work, but about the data that gets processed. I've found that the latter approach is more likely to yield a design that maximises data throughput. For user interfaces (GUIs) this approach is not that useful, because small action latencies are more important to users than data throughput -- users want snappy applications.
Perhaps, but I've never needed one myself; I'd have to see some real life use cases first. Certainly the reordering of the list would be feasible.
I use a priority queue in an application that routes IPC between processes. Certain blocks of IPC have a higher priority, and therefore get routed ahead of other blocks in the queue.
Quote:
Originally Posted by Nominal Animal
I do realize that switching from queuing threads to queuing the work the threads do, is a significant change in program logic. There are side benefits, however. One is the possibility of dynamically resizing the driver (thread) pool size. If you have threads doing a long chain of events, you do not need a thread pool; at such points the threads may simply swap work loads, if the point is congested. But work order is guaranteed. Another is better data encapsulation -- it is no longer about the thread that does the work, but about the data that gets processed. I've found that the latter approach is more likely to yield a design that maximises data throughput. For user interfaces (GUIs) this approach is not that useful, because small action latencies are more important to users than data throughput -- users want snappy applications.
Do you favor thread pools for queuing the work in addition to doing the work? As an example, in the application mentioned above, each client process has a corresponding thread in the server process to parse incoming IPC blocks. I've done this so I can use blocking reads; if an IPC block is incomplete, the flex/bison parser will wait until it is complete. If a client process fails to send a complete IPC block this will only compromise its own dedicated thread. After parsing is complete, the IPC block is queued and the routing thread picks it up in priority order (actually it picks up and routes a % of the queue each time around). I've contemplated using thread pools for both parsing and routing, but I haven't gotten around to either. My usage of the application has mostly been epistemic, so I haven't build any "real" systems around the framework yet (i.e. "addicted to solving problems").
Kevin Barry
Do you favor thread pools for queuing the work in addition to doing the work?
Yes, I do. Typically my tasks are simulations, where each worker can only work on a complete data block, so I tend to use asynchronous or nonblocking I/O to read all data, and let the threads in the pool pick the earliest completed data block.
Quote:
Originally Posted by ta0kira
As an example, in the application mentioned above, each client process has a corresponding thread in the server process to parse incoming IPC blocks. I've done this so I can use blocking reads; if an IPC block is incomplete, the flex/bison parser will wait until it is complete.
If you do not need the flex/bison parser to determine where each IPC block starts and ends, I would have a separate thread receiving the IPC blocks into per-client buffers. Whenever it completes an IPC block, it adds it to the parser work pool -- effectively hands it off to a random parser thread.
The difference between the two approaches is the number of concurrently active (allocated) parser data structures needed. Your approach uses up to one parser per client. My approach, given a sufficient number of incoming IPC blocks, will saturate the CPU with about one thread per core; there is no need to let the parser thread pool grow any larger than a fixed limit (about the number of CPU cores available).
While there should be no real difference in efficiency, my approach should require less memory, especially when there are many active clients. In my approach, the server has just a small receive buffer per client, and a fixed number of parsers. (Do you have safequards against DOS? What happens if a lot of clients connect to your server, but only provide the initial part of an IPC block?)
I have not compared the code complexity between the two approaches, so it is possible that your approach is saner/better in real life. After all, all code has to be maintained, and more straightforward code usually means bugs are easier to detect, find, and fix.
Quote:
Originally Posted by ta0kira
After parsing is complete, the IPC block is queued and the routing thread picks it up in priority order (actually it picks up and routes a % of the queue each time around).
Scheduling is also an interesting problem there. You can have absolute priorities, with every higher priority block processed before any lower priority ones, or relative priorities like in (fair) process schedulers (CFS, BFS). The implementations are very interesting.
Quote:
Originally Posted by ta0kira
I've contemplated using thread pools for both parsing and routing, but I haven't gotten around to either. My usage of the application has mostly been epistemic, so I haven't build any "real" systems around the framework yet (i.e. "addicted to solving problems").
The work I've done with this stuff is with molecular dynamics simulations using classical potential models (i.e. simulating individual atoms, using simplified interaction models and Newtonian dynamics), especially developing methods to eliminate (yes, completely) network communication delay when running distributed on a cluster, and using C99 and pthreads (as opposed to Fortran and processes). Thus far I've attracted zero academic interest , so I've been doing this sparsely, on my free time.
If you do not need the flex/bison parser to determine where each IPC block starts and ends, I would have a separate thread receiving the IPC blocks into per-client buffers. Whenever it completes an IPC block, it adds it to the parser work pool -- effectively hands it off to a random parser thread.
The difference between the two approaches is the number of concurrently active (allocated) parser data structures needed. Your approach uses up to one parser per client. My approach, given a sufficient number of incoming IPC blocks, will saturate the CPU with about one thread per core; there is no need to let the parser thread pool grow any larger than a fixed limit (about the number of CPU cores available).
My first protocol used a parser I wrote myself, but it was fairly inflexible. I switched to flex/bison a while ago and the protocol is a lot cleaner now, but even if I didn't use flex/bison the lexer/parser contexts would need to be preserved somehow. It's not a problem keeping one context pair (lexer/parser) per client, and I suppose I could just use select and non-blocking reads from a pool of threads. There is also the context of each client, however, that tracks the client's permissions, violations, and errors. The parser instantiates a certain type of root node of a binary tree based on the first part of the IPC block and the root node assembles a binary tree below itself with the rest of the block (as it's parsed).
Quote:
Originally Posted by Nominal Animal
(Do you have safequards against DOS? What happens if a lot of clients connect to your server, but only provide the initial part of an IPC block?)
It's a little misleading to call it a server in that clients don't connect to it; it forks all of its clients. It's an ongoing experiment of mine to see if I can come up with a good way to modularize an application into multiple processes using a paradigm I devised. The server has limits for anything that could possibly cause DOS. I spent a month outlining security before I even started coding it; the server is essentially the implementation of that security policy.
Quote:
Originally Posted by Nominal Animal
Scheduling is also an interesting problem there. You can have absolute priorities, with every higher priority block processed before any lower priority ones, or relative priorities like in (fair) process schedulers (CFS, BFS). The implementations are very interesting.
I'm basically balancing the trade-off of "push" vs. "pop" time with an n-ary tree where n can be set at compile time. Each waiting item gets additional "consideration" WRT its absolute priority based on how many times it's been passed over in the queue. This is at no added time complexity over absolute priority since only the location of the inserted element is changed, but it does come with a few extra additions and multiplications per comparison.
Kevin Barry
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.