Hello everyone !
This is a message in a bottle...
I encounter currently an unexpected behavior at work on my embedded Linux system:
Context:
My software application is a multithreaded application (using conditions, mutex for synchronization between threads).
Some of the threads participate to a real time operation (doing a task in less than 1 sec).
These threads are set with the following properties: scheduler=SCHED_RR, priority=55
What's going wrong ?
It running well most of the time.
But sometimes (1 time during 2 hour with the same test conditions during this time), I cannot explain why most of the threads are "locked" during more than 1 second without any explanation.
With ftrace I saw that during this "lock" time only some thread/process are still running, with a lot of "idle" task time (> 20ms) between each of them:
- 2 threads with the following properties: scheduler=SCHED_RR, priority=50
These thread are doing some network operations (with a select). I don't know what they does exactly (it is a closed source thirdparty library we use)
- vsftpd access (in our test a PC does periodically FTP access to a FTP directory on compact flash device).
I saw that can reproduce this behavior by executing a "dd" command in order to write files to flash continously.
I don't understand:
- why all the other tasks are "locked" during all this time (sometimes when requesting a mutex, sometimes when calcultating a simple 16bits-CRC).
- why so much idle time can be seen with ftrace (between sched events) during this duration.
- why higher application thread priorities don't solve the issue.
Since it seems linked with IO, I changed the IO scheduler : deadline => cfq.
Result: It seems a little better, especially with the "dd" command, but the "locking" issue is still present.
I added a call to "fsync" because 10 seconds before the issue, 1MB were copied to flash (fsync is called just after the copy of these files).
Result: better but not fixed (reduced the number of occurrences).
My question:
I suspect something linked with the IO management in kernel, as if kernel preempt every non IO thread in order to do all the works relating to IO (network, files, ...).
Is it right ?
Does kernel implements by default this behavior ?
If yes, How does it work ?
How to fix/change it in order to solve my issue ?
My kernel settings:
Linux kernel version 2.6.39
Preempt option enabled
tickless
HZ=1000
CFQ scheduler (Default settings)