pthreads vs processes

kamstrup · 12-13-2004, 04:03 AM

I am in the planning stages of a small modular server, and was wondering if there where legitimate reasons for using processes created by fork() instead of using pthreads?

pthreads gives me mutexes, conditional variables and (fairly easy) shared memory managment.

processes gives me what?

I am asking because I know that big projects like Apache is using processes over threads... Is there speed or stability issues working here?

bigearsbilly · 12-13-2004, 05:32 AM

From what I've read:

Unix/Linux processes are quite efficient so there's no great speed/cost saving
in using threads over forks.

I believe threads are more problematic in development and there are issues of security and complexity with threads.

The impression I get is threads are to be avoided if possible.
I think threads have been described as a hack to get around the poor
IPC and inefficient processing of other OSs.

But this is just what I've read. Never actually used threads or fork in anger.

KISS (keep it simple)

regards, billy

kamstrup · 12-13-2004, 07:47 AM

What methods of synchronization are available to processes?

As I mentioned; pthreads comes fully armed with mutexes, semaphores and conditional variables. As far as I know the only way to do inter-process communication is via pipes and shared memory.

Pipes and shared memory doesn't per se give any means of synchronization. You would have to invent you own. Am I right? At a glance this looks like a big task...

It seems to me that to KISS would mean using pthreads... But when you say "security issues" I get chills about this :S

Cheers

bigearsbilly · 12-13-2004, 07:59 AM

I'm just going by a book by Eric Raymond the art of unix programming.
(best unix philosophy book I've ever read)

but seriously, you should ask someone who knows what they are talking about.
As i said I have never used threads.

I was just giving a useless opinion!

regards, billy

gamehack · 12-13-2004, 08:09 AM

Most OS's don't let you run threads under a different user/account. Security reasons may warrant forking off a process and setting it to another user/account before letting it do anything, for example.

Cheers,
gamehack

kamstrup · 12-13-2004, 08:43 AM

bigearsbilly: If E.R. is talking about it there's probably something to it. Don't worry; I posted here to get to get the opinions/comments I could.

gamehack: You've got a point with the account thing.

I was also thinking in performance, but I guess it would be quite hard to predict anything. Maybe I just have to grab myself by the roots and write a small testing suite (I was hoping that wouldn't be necesary...).

bigearsbilly · 12-13-2004, 08:46 AM

too true!
we all know that's the only way to find out for sure!

good luck.

kamstrup · 12-13-2004, 03:01 PM

A small twist to the story might that I thought I came up with a legit reason for using processes over threads. Processes would allow for SMP benefits. Digging around I discovered that pthreads in Linux are actually special instances of processes, so regarding Linux this wouldn't matter.

I also found out that you have acces to semaphores (sysV) when using processes also...

shy · 12-14-2004, 12:36 AM

Threads are lightweight processes, and so the switching between them is less resource consuming (e.g. you don't have to switch memory page tables in kernel).

IMHO threads should be used when you plan to extensively exchange data between parts of your program (I use the word part to denote a flow of execution, be it a process or a thread) , because if will be really painful with processes (you'll have to maintain shared memory, semaphores and so on.... you'll just spend more time programming, and you won't be guaranteed that it is worthy). But if you plan to communicate less (e.g. pass some parameters to a process on it's start and (possibly) get the return value), you can use processes. But keep in mind that threads programming requires more accuracy, because the memory is same for all the threads.

Apache can use threads too. But Apache is an example of low internal communication (each part just should serve a client's request, and it doesn't have to tell about this to another parts). AFAIK Apache prefers processes for (at least) the 2 reasons:
1. You can write modules for apache, that will be called from it's processes, and they can leak memory. So time to time Apache processes get killed to guarantee freeing of memory.
2. In Linux threads' performance is only a bit better than processes (as opposed to windows, where threads are much faster). But they (processes) can keep memory separated, thus reducing some possible errors.

kamstrup · 12-14-2004, 04:50 AM

Thank you that was very insightful...

So regading a small modular server, a decent plan might be:

- use threads in the networking core and for basic IO stuff
- use processes for plugins/modules

am I right?

shy · 12-14-2004, 05:03 AM

Well, you should either use processes or threads alone in one program, without mixing them. Their simultaneous usage leads to unpredictable results.

Quote:

- use threads in the networking core and for basic IO stuff

It depends on what you mean with words 'networking core'. If low-level sockets you'd better make it single-threaded, as send()/recv()/select() will provide you better performance.

Quote:

- use processes for plugins/modules

If you use plugins/modules as stream filters (as in Apache or as intermediate programs in pipe line -- like 'cat | sed '...' | awk | your_module | bash' ), and do not need passing intermediate results to another processes of program, than 'yes, use processes'. However, it's just my opinion, and somebody may present some other and better arguments.

kamstrup · 12-14-2004, 05:16 AM

Quote:

It depends on what you mean with words 'networking core'. If low-level sockets you'd better make it single-threaded, as send()/recv()/select() will provide you better performance.

My plans where something along the lines of the following:

- having a dictator thread controling a herd of io-threads
- the dictator thread dynamically groups all connections into groups
having X KB/second traffic
- each io-thread uses select() on their assigned groups to process IO
- output from io-threads are relayed to some plugin/module

mayur · 12-14-2004, 05:19 AM

good web-site to learn about phtreads
http://www.llnl.gov/computing/tutori...eads/MAIN.html

kamstrup · 12-14-2004, 05:26 AM

I have the book "Advanced Linux Programming" which provides an excellent introduction to pthreads and processes as such, but doesn't provide deeper information on the implementation/performance issues regarding these...

EDIT: The site does seem to provide useful info on many things though... Interesting

kamstrup · 12-15-2004, 02:21 AM

Regarding mixing of pthreads and processes

If I have some core library using pthreads and my modules uses processes, and my modules aren't even dynamically linked to the core (and the core isn't linked to the modules), should intermixing pthreads and processes still give me headaches?