ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am in the planning stages of a small modular server, and was wondering if there where legitimate reasons for using processes created by fork() instead of using pthreads?
pthreads gives me mutexes, conditional variables and (fairly easy) shared memory managment.
processes gives me what?
I am asking because I know that big projects like Apache is using processes over threads... Is there speed or stability issues working here?
Unix/Linux processes are quite efficient so there's no great speed/cost saving
in using threads over forks.
I believe threads are more problematic in development and there are issues of security and complexity with threads.
The impression I get is threads are to be avoided if possible.
I think threads have been described as a hack to get around the poor
IPC and inefficient processing of other OSs.
But this is just what I've read. Never actually used threads or fork in anger.
What methods of synchronization are available to processes?
As I mentioned; pthreads comes fully armed with mutexes, semaphores and conditional variables. As far as I know the only way to do inter-process communication is via pipes and shared memory.
Pipes and shared memory doesn't per se give any means of synchronization. You would have to invent you own. Am I right? At a glance this looks like a big task...
It seems to me that to KISS would mean using pthreads... But when you say "security issues" I get chills about this :S
Most OS's don't let you run threads under a different user/account. Security reasons may warrant forking off a process and setting it to another user/account before letting it do anything, for example.
bigearsbilly: If E.R. is talking about it there's probably something to it. Don't worry; I posted here to get to get the opinions/comments I could.
gamehack: You've got a point with the account thing.
I was also thinking in performance, but I guess it would be quite hard to predict anything. Maybe I just have to grab myself by the roots and write a small testing suite (I was hoping that wouldn't be necesary...).
A small twist to the story might that I thought I came up with a legit reason for using processes over threads. Processes would allow for SMP benefits. Digging around I discovered that pthreads in Linux are actually special instances of processes, so regarding Linux this wouldn't matter.
I also found out that you have acces to semaphores (sysV) when using processes also...
Threads are lightweight processes, and so the switching between them is less resource consuming (e.g. you don't have to switch memory page tables in kernel).
IMHO threads should be used when you plan to extensively exchange data between parts of your program (I use the word part to denote a flow of execution, be it a process or a thread) , because if will be really painful with processes (you'll have to maintain shared memory, semaphores and so on.... you'll just spend more time programming, and you won't be guaranteed that it is worthy). But if you plan to communicate less (e.g. pass some parameters to a process on it's start and (possibly) get the return value), you can use processes. But keep in mind that threads programming requires more accuracy, because the memory is same for all the threads.
Apache can use threads too. But Apache is an example of low internal communication (each part just should serve a client's request, and it doesn't have to tell about this to another parts). AFAIK Apache prefers processes for (at least) the 2 reasons:
1. You can write modules for apache, that will be called from it's processes, and they can leak memory. So time to time Apache processes get killed to guarantee freeing of memory.
2. In Linux threads' performance is only a bit better than processes (as opposed to windows, where threads are much faster). But they (processes) can keep memory separated, thus reducing some possible errors.
Well, you should either use processes or threads alone in one program, without mixing them. Their simultaneous usage leads to unpredictable results.
Quote:
- use threads in the networking core and for basic IO stuff
It depends on what you mean with words 'networking core'. If low-level sockets you'd better make it single-threaded, as send()/recv()/select() will provide you better performance.
Quote:
- use processes for plugins/modules
If you use plugins/modules as stream filters (as in Apache or as intermediate programs in pipe line -- like 'cat | sed '...' | awk | your_module | bash' ), and do not need passing intermediate results to another processes of program, than 'yes, use processes'. However, it's just my opinion, and somebody may present some other and better arguments.
It depends on what you mean with words 'networking core'. If low-level sockets you'd better make it single-threaded, as send()/recv()/select() will provide you better performance.
My plans where something along the lines of the following:
- having a dictator thread controling a herd of io-threads
- the dictator thread dynamically groups all connections into groups
having X KB/second traffic
- each io-thread uses select() on their assigned groups to process IO
- output from io-threads are relayed to some plugin/module
I have the book "Advanced Linux Programming" which provides an excellent introduction to pthreads and processes as such, but doesn't provide deeper information on the implementation/performance issues regarding these...
EDIT: The site does seem to provide useful info on many things though... Interesting
If I have some core library using pthreads and my modules uses processes, and my modules aren't even dynamically linked to the core (and the core isn't linked to the modules), should intermixing pthreads and processes still give me headaches?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.