Where does scheduler run on multicore system ?

bayoulinux · 05-25-2012, 06:09 AM

Hi:

Where does the Linux scheduler (yeah, the kernel, but that's not what I'm asking) run within a multicore system - it is a shared location than can access all CPUs? Is there a single processor within a mutlicore system that runs the scheduler decides "put this process A on CPU 2", now put this process B on CPU 4", now move process A off CPU 2 and put it on CPU 4"?

I know its a basic question, but I just don't understand how the Linux kernel/scheduler is distributed amongst a multicore CPU system.

Thanks

Sergei Steshenko · 05-25-2012, 11:30 AM

Quote:

Originally Posted by bayoulinux

... it is a shared location than can access all CPUs? ...

It is a wrong question - as we say it in Russian, you misunderstand who beats whom.

Memory locations do nor access anything, but CPUs (and and other bus masters) do. A program is run by a CPU/core - not the other way round.

bayoulinux · 05-25-2012, 07:44 PM

Fair enough... so let me rephrase....

Is there a dedicated CPU running the scheduler that puts processes upon different cores, or does each core run a form of the scheduler and the process target core is coordinated through share memory?

eantoranz · 05-25-2012, 11:17 PM

No... that's why it's called SMP, for Symmetric multiprocessing. If the scheduler would run on a single CPU it wouln't be symmetric.

Nominal Animal · 05-26-2012, 12:46 AM

The CFS scheduler design document in the Linux Kernel Documentations describes it quite well. (Also see Completely Fair Scheduler at Wikipedia.) Essentially, each task to be scheduled is kept in a time-ordered red-black tree, with the leftmost element always being the next to be run. Each CPU simply grabs the currently leftmost task to find the one it should run. After running the task for a suitable time, the task is put back to the tree, position depending on its priority and how long it was run.

Run queues are the traditional approach. The overall principle with respect to multiple CPUs is the same with run queues: each CPU just grabs the next task from the top of the run queue, executes it for a while, then puts it back into the appropriate place in the run queue depending on its priority and how long it was run.

In an abstract sense, the number of CPUs does not matter, since each CPU just grabs the task that would be run next. In practice, there are features like locking/atomicity (so that two or more CPUs won't modify the same data structures at the same time in incompatible ways), scalability (working with a lot of tasks, and/or CPUs), CPU affinity (prefer to keep the task on the same CPU for efficiency), hardware interrupt handling, non-uniform memory architecture (some memory being easier to access than others for each CPU), and so on. Therefore, the data structures and implementation details tend to be extremely important to make sure each task gets scheduled at the right time, for the right amount of time, without too large latencies (non-running times in between).

syg00 · 05-26-2012, 01:08 AM

But that describes what happens to the entity described by a task_struct, not the scheduler itself, which is what I think the OP is asking about.
Short answer is the scheduler runs everywhere. Often.

Nominal Animal · 05-26-2012, 10:07 AM

Quote:

Originally Posted by syg00

But that describes what happens to the entity described by a task_struct

The point was that the scheduler is not a program-like entity, but something that each CPU runs to select which task to run next. You can say it's just a selection algorithm, with a few bells and whistles to make it niftier. Asking where it runs is like asking where qsort() runs.

Sergei Steshenko · 05-26-2012, 10:18 AM

Quote:

Originally Posted by Nominal Animal

The point was that the scheduler is not a program-like entity, but something that each CPU runs to select which task to run next. You can say it's just a selection algorithm, with a few bells and whistles to make it niftier. Asking where it runs is like asking where qsort() runs.

The whole kernel "is not a program-like entity". I.e. in a system doing something useful most of the times user tasks ("program-like entities") are run, and when task switching is to occur, kernel code takes over, analyzes previously stored state and based on it and on newly available info (like interrupt requests to be serviced, time it took to run last active user task, IO requests, etc.) kernel code decides what to do next.

sampov · 09-20-2017, 09:57 PM

I know this post is pretty old, but I found it and thought I might add a source that has a pretty good explanation. If you look at section 2.2.1 of The Linux Scheduler: a Decade of Wasted Cores they have a fairly good explanation.

syg00 · 09-20-2017, 10:27 PM

Cogent, relevant contributions always welcome.
I went to a similar presentation some years ago (before CFS), but the promised proceedings where never made public.

sundialsvcs · 09-21-2017, 02:45 PM

One sort-of-reasonable way to look at it is: "the scheduler is the thing that a CPU runs when it finds that it has nothing better to do!"

Its purpose is to decide "what to do next," and(!!) since any CPU/core might be doing so (even, literally, "simultaneously"), to be as efficient as possible.

The heuristics that face any scheduling algorithm are very complex, such that there is no (and, can never be ...) "bright-line rule" to be found. Some pundits/humorists have aptly characterized it as "a race between immediacy and utter starvation, and hurry up, willlya?"