On which core is Linux kernel running on Dual-core machine?

bastl · 01-04-2009, 06:06 AM

Look, you have maybe 4 cores, that means you have 4 CPUs so 4 processors packed together in one case with some additional chips usually placed at a multicpu board there on the motherboard.
That means, each of that silicon CPU plates in those multicore CPUs is a real one core CPU, with cache, pipelining, controllers and buses. that means register rax is 4 times in that case but on totaly different chips. Only a central controller with pipelining and cache can split code in that way, that the same programm can be executed on all chips if it is possible but sometimes surely not, too.
That also means that it lays to the programmers of source codes if there sources can be executed fast on a multicore CPU !!!
O.K. such a CPU silicon plate isn't really one plate it exists itself of a lot of plates: CPU plate, FPU plate, controller-bus-pipelining plate, cache1 and cache2 plate.
And you are right you can't say on which register you have the value you need but your program code works like usual and the muticore controller will only split code where it is possible. So functions won't be splited, only if there is more then one FLOP in it they don't depend on each other, say x= 5.48*2.31/3.44*22.50 is splited to a=5.48*2.31, b=3.44*22.50, x=a/b, a/b is done on the FPU holding a. That means on a highly integrated multicore CPUs, say 4 cores, there are not only 4 FPU silicon plates clued but 6 or more.
But if you want really to know where your code is executed then you can hard debug it in an assembler debugger with that core debug registers. In the moment I think only the non core debug registers are supported but have a look on the NASM and GCC pages. It can also be possible, that those register are not accessible from inside of the CPU but on external pins via I2C only.

crashmeister · 01-04-2009, 06:31 AM

Quote:

Originally Posted by salasi

SMP kernels (anything recent should be SMP, except for oddball things that are specifically low-resource) split the load over the two (err, sometimes with a load of zero for one cores, if the load is low).

Way I understand it is that the application has to be multithreaded in the first place for this.

If you encode a video on a dual core and use only one thread it uses 100% of one core and switches between the cores but not 2 cores at the same time.

Also got my doubts that the kernel runs on any one specific core.What would be the purpose of that?
That would basically mean that if core 1 to 3 are busy and the kernel is restricted to run on core one things would slow down for all kernel events while core 4 just sits there.
I'd venture to think that in such a case things (kernel or not) just run on whatever core is less busy.

PTrenholme · 01-04-2009, 10:27 AM

Quote:

Originally Posted by crashmeister

Way I understand it is that the application has to be multithreaded in the first place for this. [...]

The point is, of course, that the kernel is compiled as a multi-thread application. For example, when a shell starts a process, that process is in its own thread.

The process may, itself, start other processes or, internally, threads, and they will be scheduled as necessary on any available processor. You can see a (semi) graphical representation of the current process hierarchy by entering the pstree -Gc command in a terminal window. (There are, of course, other options to the pstree command. See man pstree for details.)

salasi · 01-04-2009, 01:56 PM

Quote:

I think the process splitting is done by the pipelining system of the cores means, that software does not have any possibility to choose where to be executed.

I apologise; when I first read this, I thought this was rubbish; having re-read this, I realise that you are describing hyperthreading rather than multi-core. OTOH, you are describing hyperthreading, which isn't what the thread was about...and it certainly isn't what you quote in the AMD links.

Quote:

Look, you have maybe 4 cores, that means you have 4 CPUs so 4 processors packed together in one case with some additional chips usually placed at a multicpu board there on the motherboard.

OK, I am prepared to take 4 as an example, if you want.

Quote:

That means, each of that silicon CPU plates in those multicore CPUs is a real one core CPU, with cache, pipelining, controllers and buses.

Do you mean a die, or something else? In any case, no. A multi-core CPU has multiple cores (which is more than "one real core"). So, that is more than one core, or it wouldn't be multi, would it?

Quote:

that means register rax is 4 times in that case but on totaly different chips.

Now I really can't guess your meaning. With four cores, there must be four copies of each register, but I cannot comprehend why you claim that they be on seperate chips, when the cores are on the same chip (...not that applies to all multi-core chips)

Quote:

Only a central controller with pipelining and cache can split code in that way, that the same programm can be executed on all chips if it is possible but sometimes surely not, too.

No, again. Each CPU behaves like a CPU. It has a program counter. The default behaviour is to get the next instruction after the current one (unless modified by jump instructions). This has nothing to with pipelining, its just the way that microprocessors work. Running individual tasks, of course, the micro can do all of the multitasking stuff it would normally do (you know 'pseudo-simulataneous' execution of programs or threads), but better for having more cores available.

What does have something to do with cache and pipelining is performance. Performance will be higher if the cache has the right data in it and when you want some data that isn't locally available you have to wait for it, so the sooner that you started to get it, the better, because pipeline stalls can be a real performance killer. (Which is where hyperthreading comes in, of course. That does allow you to do something else useful while you are waiting.)

Quote:

That also means that it lays to the programmers of source codes if there sources can be executed fast on a multicore CPU !!!
O.K. such a CPU silicon plate isn't really one plate it exists itself of a lot of plates: CPU plate, FPU plate, controller-bus-pipelining plate, cache1 and cache2 plate.

Plates? My best guess is that you meant something like a functional area on the floorplan of the chip.

Quote:

So functions won't be splited,

It won't split your function for ordinary, integer code (still talking about multi-core). You might, but it won't Of course, this drags in the usual parallel programming issues, but you weren't expecting a free lunch, were you?

Quote:

crashmeister
Quote:Originally Posted by salasi
SMP kernels (anything recent should be SMP, except for oddball things that are specifically low-resource) split the load over the two (err, sometimes with a load of zero for one cores, if the load is low).

Way I understand it is that the application has to be multithreaded in the first place for this.

Should have been more explicit...I meant that the kernel itself will use more than one of the CPUs. If you have one large monolithic task, that doesn't necessarily help as much as you would like.

If you had also wanted to split the application load over the cores, you would either have ensure that the application is threaded or have a lot of small applications rather than one large app. If your natural application load is many small applications, you might wonder what all the fuss is about, because it just works. Otherwise, you might find, it just doesn't.

crashmeister · 01-04-2009, 03:38 PM

Quote:

Originally Posted by PTrenholme

The point is, of course, that the kernel is compiled as a multi-thread application.

The kernel is always multithreaded and supports multithreaded apps.That has nothing whatsoever to do with SMP or how it is compiled.

SMP just enables the kernel to manage more than one cpu (and yes - a core is pretty much a cpu as far as the kernel is concerned).

jailbait · 01-04-2009, 04:41 PM

When booting on a multi-cpu system Linux boots on the first CPU just as if it is booting on a single CPU system. Once the kernel is established and set up it then goes into multi-cpu mode by scheduling threads on all of the available CPUs.
Once the kernel reaches this stage it is no longer sitting on a "master" CPU scheduling "slave" CPUs. All CPUs are equal.

Each processor is not running Linux per se. Each processor is running the thread it is currently assigned and executes application code and kernel code as the thread demands. Interrupts are processed by the CPU on which the interrupt causing event was generated. Thus the kernel interrupt code runs on which ever CPU has the latest interrupt. Likewise the scheduler runs on each CPU when that CPU reaches the next scheduling event.

So at any moment in time, after boot completes, you could have several CPUs running kernel code or one CPU running kernel code or no CPUs running kernel code.

-----------------------
Steve Stites

bastl · 01-04-2009, 05:59 PM

O.K. salasi you get it.
And I mean:
silicon plate = one part of a wafer.
CPU = such a wafer part where the CPU is on.
FPU = such a wafer part where the F_loating P_oint U_nit is on.
function = are some instructions between CALL and RET, say a c++ function or pascal procedure.
case = the plastic case all that wafer parts are infused, with leads, pins.
multicore CPU, one core CPU = a case with leads, pins (Multicore = x86-64 specific).
pipelining = a small memory the next instructions are on so the controlling can decide what to do with that instructions or work on it.
central pipelining unit = only one in a multicore CPU, splits the instructions in the described way. No programm, kernel, BIOS can control that. (only X86-64 specific, on other platforms software have to take part on it - S390, sparc. that's because of intel's downward compatibility)

kushalkoolwal · 01-05-2009, 02:33 PM

Quote:

Originally Posted by jailbait

When booting on a multi-cpu system Linux boots on the first CPU just as if it is booting on a single CPU system. Once the kernel is established and set up it then goes into multi-cpu mode by scheduling threads on all of the available CPUs.
Once the kernel reaches this stage it is no longer sitting on a "master" CPU scheduling "slave" CPUs. All CPUs are equal.

Each processor is not running Linux per se. Each processor is running the thread it is currently assigned and executes application code and kernel code as the thread demands. Interrupts are processed by the CPU on which the interrupt causing event was generated. Thus the kernel interrupt code runs on which ever CPU has the latest interrupt. Likewise the scheduler runs on each CPU when that CPU reaches the next scheduling event.

So at any moment in time, after boot completes, you could have several CPUs running kernel code or one CPU running kernel code or no CPUs running kernel code.

-----------------------
Steve Stites

This is somewhat close to what I was thinking how the kernel code executes. So basically Linux kernel = Kernel code runs on both the CPUs/Cores depending on the need.