Strange high system CPU usage in a multithreaded program
I'm encountering a strange problem with a multithreaded program I just wrote. It does a lot of mmap'ed I/O on large (~100-500M) files, with mostly randomly distributed access all over the whole file: in a nutshell, it fetches some randomly distributed bytes, does a rather long computation (~ 1000 - 1500 machine instructions), and writes the results back (to the mapped pages).
Now the program doesn't scale well when starting it as a multithreaded program: it spends a lot of time in system CPU mode (40-60%), even though there is no single system call in the running threads (besides indirect I/O via mapped pages). I assigned several parts of the input file to the threads, thus there are no mutexes involved, just a POSIX semaphore in the controlling thread which waits for the completion of other threads. (I tried with pthread_join instead, but the effect is exactly the same).
The funny thing is: if I start it on the command line as several separate processes doing exactly the same, there is almost no system CPU usage, all of CPU time is available for user computation, and it does the computation much faster.
Did any of you encounter a similar problem, and how did you eventually solve it?
Best regards, Stefan
Machine: Dual Opteron 270, running in 64-bit mode
Memory: 8GB (thus, no swapping necessary)
Last edited by Strahlemann; 09-03-2007 at 01:01 PM.