LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 03-16-2015, 03:58 PM   #1
dralasimer
LQ Newbie
 
Registered: Mar 2015
Posts: 5

Rep: Reputation: Disabled
Unhappy Kernel - scheduling of thread with IO - how does it work ?


Hello everyone !

This is a message in a bottle...

I encounter currently an unexpected behavior at work on my embedded Linux system:

Context:
My software application is a multithreaded application (using conditions, mutex for synchronization between threads).
Some of the threads participate to a real time operation (doing a task in less than 1 sec).
These threads are set with the following properties: scheduler=SCHED_RR, priority=55


What's going wrong ?
It running well most of the time.
But sometimes (1 time during 2 hour with the same test conditions during this time), I cannot explain why most of the threads are "locked" during more than 1 second without any explanation.


With ftrace I saw that during this "lock" time only some thread/process are still running, with a lot of "idle" task time (> 20ms) between each of them:

- 2 threads with the following properties: scheduler=SCHED_RR, priority=50
These thread are doing some network operations (with a select). I don't know what they does exactly (it is a closed source thirdparty library we use)

- vsftpd access (in our test a PC does periodically FTP access to a FTP directory on compact flash device).


I saw that can reproduce this behavior by executing a "dd" command in order to write files to flash continously.


I don't understand:
- why all the other tasks are "locked" during all this time (sometimes when requesting a mutex, sometimes when calcultating a simple 16bits-CRC).
- why so much idle time can be seen with ftrace (between sched events) during this duration.
- why higher application thread priorities don't solve the issue.


Since it seems linked with IO, I changed the IO scheduler : deadline => cfq.
Result: It seems a little better, especially with the "dd" command, but the "locking" issue is still present.

I added a call to "fsync" because 10 seconds before the issue, 1MB were copied to flash (fsync is called just after the copy of these files).
Result: better but not fixed (reduced the number of occurrences).


My question:
I suspect something linked with the IO management in kernel, as if kernel preempt every non IO thread in order to do all the works relating to IO (network, files, ...).

Is it right ?

Does kernel implements by default this behavior ?

If yes, How does it work ?

How to fix/change it in order to solve my issue ?



My kernel settings:
Linux kernel version 2.6.39
Preempt option enabled
tickless
HZ=1000
CFQ scheduler (Default settings)
 
Old 03-20-2015, 02:58 AM   #2
dijetlo
Senior Member
 
Registered: Jan 2009
Location: RHELtopia....
Distribution: Solaris 11.2/Slackware/RHEL/
Posts: 1,491
Blog Entries: 2

Rep: Reputation: Disabled
Quote:
I cannot explain why most of the threads are "locked" during more than 1 second without any explanation.
They are being blocked by the threads that aren't locked.
Or, to think of it another way, a thread with a higher priority is on a loop and forcing the rest of your threads to chill or it's scheduling property is "batch" for example.

Quote:
- 2 threads with the following properties: scheduler=SCHED_RR, priority=50
These thread are doing some network operations (with a select). I don't know what they does exactly (it is a closed source thirdparty library we use)
These seem likely candidates...
 
Old 03-20-2015, 03:53 PM   #3
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
Hi

Sorry if this does not help but I have no clue about "realtime" stuff - but I do fiddle around with threads from time to time so I'll just dump what I have in mind:
1)
This...
Quote:
I saw that can reproduce this behavior by executing a "dd" command in order to write files to flash continously.
...is probably an important hint, right?

2)
Quote:
(1 time during 2 hour with the same test conditions during this time), I cannot explain why most of the threads are "locked" during more than 1 second without any explanation.
Do you mean that they're waiting to grab one of your mutexes or are they basically just "frozen" for no apparent reason? Did you already e.g. add "sleep" calls in the method(s)/functions that are called by the threads to exclude that it's the organization of your mutexes that blocks them?
Maybe your problem is a mixed one - maybe on one side your code is not optimal, and on the other side it's because of something else?

3)
Quote:
- why higher application thread priorities don't solve the issue.
I don't think that thread/process priorities solve anything when having to deal with locked mutexes - once the mutex is locked, the process that locked it has to finish => actually raising the priority of the threads that are waiting for the lock on the mutex might make the problem worse, as CPU cycles will be taken away from the thread that has the mutex already locked therefore making that thread slower therefore making the thread that is waiting for the mutex to be unlocked to wait even longer than usual. Or am I wrong?

4)
Btw., which kernel version are you using?
There was a big patch some time ago (for sure before kernel 3.17) to make the flush of dirty buffers not lock all processes => maybe that might help?
 
1 members found this post helpful.
Old 03-21-2015, 05:47 AM   #4
dralasimer
LQ Newbie
 
Registered: Mar 2015
Posts: 5

Original Poster
Rep: Reputation: Disabled
Thanks for your responses.


I can reproduce easily a similar behavior with a dd (5min after starting dd) but this behavior can also occur on nominal application use (without any dd):
- when one of my two network cameras is disconnected (these 2 cameras are managed by a binary library "pylon" from Basler (with two thread with SCHED_RR and priority 50).
But I don't know what this library does exactly...
- when something else occurs (I suspect log flush to flash but I don't have the proof).


I'm wondering if it is the same root cause for all of this...

Sometimes the last ftrace mark I see before the "lock" is a boost mutex lock.
But sometimes, the "lock" occurs during a simple CRC calculation (no mutex lock request).


My kernel version is 2.6.39
About flush patch: Interesting information you gave me.


If you are interested, I shared the ftrace file (to be viewed with kernelshark) when I reproduce the issue with an external dd command:
https://drive.google.com/file/d/0B6p...ew?usp=sharing

Maybe it can help you to see what is really happening for the "dd" case only

The "hole" ("freeze") is (no more custom ftrace markers from my application) at timestamps:

begin: 469.118370
end: 469.802940

Another little "hole"
begin: 469.807644
end: 469.952975
 
Old 03-22-2015, 10:05 AM   #5
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
I see this in your trace:

Quote:
begin: 469.118370
Code:
             sdq-1654  [001]   469.118369: kmalloc:              (tracing_mark_write+0x35) call_site=c015eede ptr=0xcd845540 bytes_req=78 bytes_alloc=128 gfp_flags=GFP_KERNELGFP_NOTRACK
             sdq-1654  [001]   469.118371: print:                00000000 MARKER - ../../../../import/boost_util/include/util/Concurrent_queue.h - 203
             sdq-1654  [001]   469.118372: kfree:                (tracing_mark_write+0x87) call_site=c015ef30 ptr=0xcd845540
             sdq-1654  [001]   469.118374: sys_exit_write:       0x4c
             sdq-1654  [001]   469.118375: sys_exit:             NR 4 = 76
             sdq-1654  [001]   469.118377: sys_enter_close:      fd: 0x001e
             sdq-1654  [001]   469.118378: sys_enter:            NR 6 (1e, 0, 8139ff4, 810d4b4, b4432817, b4432998)
             sdq-1654  [001]   469.118381: sys_exit_close:       0x0
             sdq-1654  [001]   469.118383: sys_exit:             NR 6 = 0
             sdq-1654  [001]   469.118388: sys_enter_futex:      uaddr: 0x8980734, op: 0x0080, val: 0x0a35, utime: 0x0000, uaddr2: 0x8980718, val3: 0x0a35
             sdq-1654  [001]   469.118389: sys_enter:            NR 240 (8980734, 80, a35, 0, 8980718, a35)
             sdq-1654  [001]   469.118395: sched_stat_runtime:   comm=sdq pid=1654 runtime=227712 [ns] vruntime=48967751264 [ns]
             sdq-1654  [001]   469.118401: sched_switch:         sdq:1654 [120] S ==> kworker/0:0:0 [120]
          <idle>-0     [000]   469.118411: irq_handler_entry:    irq=18 name=uhci_hcd:usb4
          <idle>-0     [000]   469.118423: kmem_cache_free:      (uhci_free_urb_priv+0x8c) call_site=c031a650 ptr=0xde769ee4
          <idle>-0     [000]   469.118428: kmem_cache_alloc:     (uhci_urb_enqueue+0x52) call_site=c031c54c ptr=0xde769ee4 bytes_req=28 bytes_alloc=28 gfp_flags=GFP_ATOMIC|GFP_ZEROGFP_NOTRACK
          <idle>-0     [000]   469.118434: irq_handler_exit:     irq=18 ret=handled
          <idle>-0     [000]   469.118443: irq_handler_entry:    irq=16 name=uhci_hcd:usb2
          <idle>-0     [000]   469.118450: kmem_cache_free:      (uhci_free_urb_priv+0x8c) call_site=c031a650 ptr=0xde76957c
          <idle>-0     [000]   469.118454: kmem_cache_alloc:     (uhci_urb_enqueue+0x52) call_site=c031c54c ptr=0xde76957c bytes_req=28 bytes_alloc=28 gfp_flags=GFP_ATOMIC|GFP_ZEROGFP_NOTRACK
          <idle>-0     [000]   469.118458: irq_handler_exit:     irq=16 ret=handled
etc...

Quote:
begin: 469.807644
Code:
        netcheck-1665  [001]   469.807643: kmalloc:              (tracing_mark_write+0x35) call_site=c015eede ptr=0xcd845540 bytes_req=78 bytes_alloc=128 gfp_flags=GFP_KERNELGFP_NOTRACK
        netcheck-1665  [001]   469.807645: print:                00000000 MARKER - ../../../../import/boost_util/include/util/Concurrent_queue.h - 203
        netcheck-1665  [001]   469.807645: kfree:                (tracing_mark_write+0x87) call_site=c015ef30 ptr=0xcd845540
        netcheck-1665  [001]   469.807647: sys_exit_write:       0x4c
        netcheck-1665  [001]   469.807649: sys_exit:             NR 4 = 76
        netcheck-1665  [001]   469.807651: sys_enter_close:      fd: 0x001e
        netcheck-1665  [001]   469.807652: sys_enter:            NR 6 (1e, 0, 8139ff4, 810d4b4, b3927817, b3927998)
        netcheck-1665  [001]   469.807655: sys_exit_close:       0x0
        netcheck-1665  [001]   469.807656: sys_exit:             NR 6 = 0
        netcheck-1665  [001]   469.807663: sys_enter_futex:      uaddr: 0xb5a869e4, op: 0x0080, val: 0x0273, utime: 0x0000, uaddr2: 0xb5a869c8, val3: 0x0273
        netcheck-1665  [001]   469.807664: sys_enter:            NR 240 (b5a869e4, 80, 273, 0, b5a869c8, 273)
        netcheck-1665  [001]   469.807680: sched_switch:         netcheck:1665 [44] S ==> kworker/0:0:0 [120]
          <idle>-0     [000]   469.808053: hrtimer_cancel:       hrtimer=0xdf002b34
          <idle>-0     [000]   469.808055: hrtimer_expire_entry: hrtimer=0xdf002b34 now=529383007503 function=tick_sched_timer/0x0
          <idle>-0     [000]   469.808057: softirq_raise:        vec=1 [action=TIMER]
          <idle>-0     [000]   469.808062: hrtimer_expire_exit:  hrtimer=0xdf002b34
          <idle>-0     [000]   469.808063: hrtimer_start:        hrtimer=0xdf002b34 function=tick_sched_timer/0x0 expires=529384000000 softexpires=529384000000
          <idle>-0     [000]   469.808069: irq_handler_entry:    irq=0 name=timer
          <idle>-0     [000]   469.808073: irq_handler_exit:     irq=0 ret=handled
          <idle>-0     [000]   469.808076: softirq_entry:        vec=1 [action=TIMER]
          <idle>-0     [000]   469.808077: softirq_exit:         vec=1 [action=TIMER]
          <idle>-0     [001]   469.808082: hrtimer_cancel:       hrtimer=0xdf202b34
          <idle>-0     [001]   469.808084: hrtimer_expire_entry: hrtimer=0xdf202b34 now=529383037045 function=tick_sched_timer/0x0
          <idle>-0     [001]   469.808090: softirq_raise:        vec=1 [action=TIMER]
          <idle>-0     [001]   469.808091: softirq_raise:        vec=9 [action=RCU]
          <idle>-0     [001]   469.808096: hrtimer_expire_exit:  hrtimer=0xdf202b34
etc...
I have no clue what the output means but in any case I don't see anything interesting that is unique only to these two blocks.

Still taking into account that I know nothing about realtime processes/threads and therefore walking on thin ice, I think that your issue is complex because there are a lot of different things going on at the same time (CPU, I/O, network, RAM de/alloc).

Taking a step back, are we sure that your OS/HW is ready to handle realtime processes?

You could first of all start using a kernel version that does not globally block I/O when it has to flush dirty blocks / is emptying the writeback buffer to storage => have a look at the changelogs.

Then, I'm not sure if you use a special realtime-kernel. In the Gentoo distribution there is a separate kernel for realtime (package "sys-kernel/rt-sources" - "Full Linux 3.14 kernel sources with the CONFIG_PREEMPT_RT patch").

Then I'm not sure if TICKLESS is good - is for sure good for powersaving but I would guess that's not a great feature for realtime stuff (read e.g. here from the 2nd post onwards). You probably don't want your CPU to go to sleep.

Finally, I'm not sure if you're using power saving options (I would disable them, including Intel's turbo boost) and/or what your CPU is (e.g. AMD multicore CPUs are less multicore than the ones of Intel - the CPUs still share components internally) and/or if you're using hyperthreading (I would disable it) and/or look if the spontaneous migration of the processes/threads between CPU cores might cause troubles with realtime processes/threads and if yes then see how to assign them always to the same CPU core.

What do you think?
 
1 members found this post helpful.
Old 03-22-2015, 03:22 PM   #6
dijetlo
Senior Member
 
Registered: Jan 2009
Location: RHELtopia....
Distribution: Solaris 11.2/Slackware/RHEL/
Posts: 1,491
Blog Entries: 2

Rep: Reputation: Disabled
Quote:
I can reproduce easily a similar behavior with a dd (5min after starting dd) but this behavior can also occur on nominal application use (without any dd):
Have you tried monitoring IO activity (iostat -xp <device> 1) to see if your suspicion it's IO related is correct? I tend to agree with Pearlseattles assessment that we're dealing with too many possibilities...
Quote:
start using a kernel version that does not globally block I/O when it has to flush dirty blocks / is emptying the writeback buffer to storage...I'm not sure if TICKLESS is good
This may just be the trade-off 'tickless' has made, an easy test is to change platforms (address the HW issue and remove the SCHED_RR threads from the equation), install a kernel with a different thread scheduling algorithm and see if the problem persists.

Last edited by dijetlo; 03-22-2015 at 03:50 PM. Reason: needed moar letterz.. no wait... less letterz
 
Old 03-25-2015, 10:16 PM   #7
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941
It sounds to me, simply, that there's a bug in that application or library. At some point, under load, the bug manifests itself ... some kind of tiny timing-hole in the way that it uses mutexes, maybe. Perhaps some I/O-error or other condition that happens rarely. Or maybe the thread which has the resource locked just is waiting a lot longer than you think it should.

When a deadlock occurs, CPU utilization drops to near-zero, and fiddling with thread/process priorities won't help because none of them are "runnable." Looking at an ordinary ps display at that time should show you what the various processes and threads are waiting for.
 
Old 07-01-2015, 12:19 PM   #8
dralasimer
LQ Newbie
 
Registered: Mar 2015
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hi everyone,

I update the status of this topic: We think we found the root cause of the lock.

My company hired a Linux expert for 2 days. Thanks to him we found that:

The locks were caused by the fact that all flash access are blocked when a flush of data is done by kernel.

Especially our logger module (used for timestamping...) which calls the syslog() function.

But this syslog() function blocks also the process, even if syslogd daemon is the real process accessing to flash...
(we suspect that unix sockets used for syslog communications block until resources are available, like a bash pipe '|' when writing a lot of logs into a file located in flash).


The solution was to split all the access to flash between real time threads and the other by doing log/flash access into an isolated thread (with a non blocking custom message Queue as communication item)

And it seems that it works !
 
1 members found this post helpful.
Old 07-01-2015, 05:55 PM   #9
Pearlseattle
Member
 
Registered: Aug 2007
Location: Zurich, Switzerland
Distribution: Gentoo
Posts: 999

Rep: Reputation: 142Reputation: 142
Thanks for the update
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Thread scheduling on multicore hemalbavisi Linux - Kernel 2 07-08-2013 03:58 AM
thread scheduling. is it guaranteed that kernel will run threads on all cores? anilp Linux - Kernel 1 01-26-2011 02:30 PM
Thread Scheduling using POSIX THREAD rameshtekumudi Programming 3 09-01-2006 10:47 AM
thread scheduling vkmgeek Programming 2 08-21-2006 10:29 AM
scheduling thread gives error bndpatel Programming 1 06-24-2005 01:54 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 01:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration