[SOLVED] Understanding System Calls

jlliagre · 10-30-2016, 04:26 AM

Beware that the system call entry implementation is varying. Modern linux releases on Intel architecture deprecated legacy int0x80 code to new dedicated instructions, sysenter/sysexit.

Note also that telling the kernel is "a program too" might be misleading. It is correct to state the kernel is a specific program that starts at boot time and runs until the system shuts down or panic, but when talking about system calls, it is not *that kernel program* that perform the call but the process itself which switched from user mode to kernel mode and is then executing kernel code. The kernel code is still running in the context of that specific process.

linux4evr5581 · 10-30-2016, 04:37 AM

Quote:

Originally Posted by jlliagre

Beware that the system call entry implementation is varying. Modern linux releases on Intel architecture deprecated legacy int0x80 code to new dedicated instructions, sysenter/sysexit.

Note also that telling the kernel is "a program too" might be misleading. It is correct to state the kernel is a specific program that starts at boot time and runs until the system shuts down or panic, but when talking about system calls, it is not *that kernel program* that perform the call but the process itself which switched from user mode to kernel mode and is then executing kernel code. The kernel code is still running in the context of that specific process.

I learned alot useful a things from this thread, but this part almost explains my OP.. Ok may I ask what program did that process come from? And does that process interact with the libary?

jlliagre · 10-30-2016, 05:10 AM

All processes come from a special process handcrafted by the kernel at boot time. This process is generally init or systemd. This initial process is creating other processes by forking itself, using first the fork (or clone on Linux) system call then the exec system call. This fork/exec sequence is how all processes are created a hierarchical way.

I'm not sure about what you ask for with your last question. All processes, including init/systemd interact with the standard library (usually glibc) by calling it.

linux4evr5581 · 10-30-2016, 05:24 AM

Quote:

Originally Posted by jlliagre

I'm not sure about what you ask for with your last question. All processes, including init/systemd interact with the standard library (usually glibc) by calling it.

I dont know why I thought a program spawned the process.. Its late... But this part explains it, thank you so much! But just to reiterate are the opcodes of a program transferred to the library via this process? Or is this a particular process chosen by the OS depending on the opcodes of the program?

jlliagre · 10-30-2016, 07:33 AM

Opcodes are just bytes, they are normally stored on disk in executable binaries, libraries, modules, and kernel images and they are loaded into memory if not yet there when they need to be executed. There is no such thing as a "transfer of opcodes to a library". The opcodes do not decide what process is launched, they are just basic instructions. Operands to the exec family system calls are telling what program need to be launched. Which one it should be is either a user decision or an OS setting (daemons).

jpollard · 10-30-2016, 09:15 AM

Quote:

Originally Posted by linux4evr5581

I dont know why I thought a program spawned the process.. Its late... But this part explains it, thank you so much! But just to reiterate are the opcodes of a program transferred to the library via this process? Or is this a particular process chosen by the OS depending on the opcodes of the program?

Well... all programs get spawned by another "program"...

This starts when the kernel is booted - the BIOS starts at a specified address either forced on the CPU, or builtin - depending on the CPU hardware or hardware environment. I have used systems that started at address 0 when a hardware reset signal was given. But that hardware environment recognized that the first memory access operation following the reset was an instruction fetch. Since it was to interpret the instruction, it replaced the address being accessed with the starting address of a ROM, which then did a jump/branch to the rest of the ROM allowing the takeover of the CPU.

The BIOS then (after the other stuff it does) loads the boot block (or even an entire boot program depending on version/style). I believe UEFI actually loads the entire program as it has FAT built in, and load the program from that. The program is defined to be started by the BIOS (usually a jump instruction), and then the boot program takes over. It may set interrupt vectors, error traps.. and then loads a specified kernel into a specified starting address (I don't remember if the file specifies that, or if it had to be built into the boot program).

With Linux, the boot program is also used to copy the initrd following the kernel (next page boundary I think), and then does a jump to the starting address of the kernel. At this point the kernel code takes over. It will setup any needed interrupt vectors for CPU faults, MMU faults, hardware clock, and other CPU related activity (memory sizing for instance, setting up a fake process 0 for the idle task). At this point the kernel is still "not quite" running - and it uses the idle task context to do the first "fork". At this point the "parent process" enables context switching which allows the system change what is considered the "current" process (which is pid 0). This NEW process (pid 1) then takes over, and is now running in user mode, but with the memory map of the kernel.

The new kernel process (pid 1, still using the kernel privileges) then decompresses the data in the initrd, copying the result into a ram based filesystem (there are various forms available) and does a "mount" of this virtual filesystem (along with mounting /dev as a "devtmpfs" which links to the kernel device structure). This is now the "root" filesystem (and the initrd memory copy is deallocated for future use). The "pid 1" process can now do an exec system call (it is running in a privileged user mode) and the normal activity of the init program selected takes over.

Which program is counted as init depends on several things - Kernel command line parameters can specify /bin/bash (or other command interpreter, such as the "busybox" interpreter). By default (as in no parameters) it will exec "/sbin/init" from the temporary root filesystem.

The exec system call simply replaces the running executable with pages loaded from the specified binary.

Thus the kernels first process is used to fork a new process, which then execs the init program. It is now the responsibility of the init program to complete things, spawning new processes. These processes can finish device scanning, loading drivers (still using the temporary root filesystem, and updating /dev).

One of the things that happens is that must be done is to identify the real root. This is usually done by the kernel parameter "root=" passed to the kernel by the boot loader. Once this is mounted (lets call the mount point "syslinux"), a system call (pivot_root) is invoked to exchange the data structure representing the temporary root with the data structure representing the real root on the mountpoint syslinux. Once this is completed it is possible dismount the temporary root (now connected to the /syslinux mount point), which deallocates the memory given the temporary root.

init then finishes any specific process startup, starting other processes (via fork/exec), and for desktops finishes up by starting the desktop GUI service daemon (there are a good number of these - gdm, lightdm, ...).

And the system is ready for use.

The key is that all processes are started by another process - the first one being the idle process (pid 0), which is built into the kernel, which uses the fork system call to create pid 1 and uses the exec system call replace the "program" running with one loaded from the virtual root filesystem.

The same sequence occurs for those system booting without an initrd, the difference is that instead of decompressing a nonexistent initrd, it mounts the real root. The init loaded from that root will not make use of the "pivot_root" system call (as it isn't needed), but continues the rest of the sequence pretty much the same way - treating the real root the same way the virtual root was but omitting that unneeded pivot_root.

Now system calls are just another way to call a function - though it includes a little extra information. A system call is an exception, and uses the interrupt vectors to identify where to transfer execution, and what mode the execution is to be used. It is that "and" that counts for security control. The user process cannot control where execution is going (that is controlled by the interrupt vector), just that execution is going to be tranfered.

How parameters are passed/interpreted is up to the cooperation between the user mode process using the system call, and the kernel. This is one of the reasons that system calls are usually handled by the "libc" library. It knows the registers to be used, what registers get modified, values returned... The kernel on the other hand has to verify the parameters passed as being valid, and permitted - and report any errors to the user process (and sometimes report the errors/activity to other processes).

Hope this helps...

sundialsvcs · 10-30-2016, 01:56 PM

Quote:

Originally Posted by linux4evr5581

Is my underatanding of system calls correct? Its that programs interact with the kernel indirectly by having its opcodes interpreted by the cpu and then transferred to the system library. These opcodes (which are stored in the program's executable file) contain the instructions on what resources it needs to be able run, and the kernel provides those resources through the system's libary system calls...

Every program consists of opcodes ("machine instructions") which are executed by the CPU.

These instructions do not directly "specify what resources it needs to be able to run." This information might be contained in additional, descriptive information within the program-file, which is used by the so-called "loader" that actually launches your program.

This loader (may ...) link your program to the "libraries" that it requires, so that your program can call the subroutines that the library contains.

From time to time, your program (either directly, or within a library that it has called) must ask the Linux kernel to do something on its behalf. This is done by means of "a system call," and the exact manner in which this is done varies by CPU-type. Nonetheless, your program switches from "user mode" to "kernel mode," and begins executing code within the kernel. Eventually, the kernel code exits from the system-call and your program continues, once-again in user mode.

System-calls are a one-way door, since they involve a transition into kernel mode. You can't execute arbitrary code within the kernel: you can only execute the system calls that the kernel provides. Your program also can't see just how those system calls are implemented. (I guess it's a little bit like dying: when the time comes, all you can do is pass through the portal, with no idea what's actually on the other side of it, and hope for the best.)

jlliagre · 10-30-2016, 02:50 PM

Quote:

Originally Posted by sundialsvcs

(I guess it's a little bit like dying: when the time comes, all you can do is pass through the portal, with no idea what's actually on the other side of it, and hope for the best.)

There is however a slight difference: unlike dying people, most if not all system call eventually return…

linux4evr5581 · 10-30-2016, 03:52 PM

@jlliagre- Thanks for clearing that up, I guess an example would be how the command ls works by havings it's opcodes stored in /bin/ls, and having the behavior of this program optionally modified through configuration files, which I guess which modify the operands..

@jpollard- Awesome in-depth explanation I truly appreciated it, as I kinda had a detailed understanding of the boot process but not to this extent. I will definitely PDF this thread and refer back to it as I want make my own efi specifications for a custom secureboot implementation with MBR and LILO. I read somewhere, idk if its true, but supposedly UEFI's uniform design make it the same accross all platforms, thus making it more easy to target.. But anyhow good explanation I understand now why libararies are so important as they make things alot easier/effiecient, and I take it the kernel communicates the errors to other processes via signals, to report the state of the system/hardware. Cool...

@sundialsvcs-

Quote:

Originally Posted by sundialsvcs

From time to time, your program (either directly, or within a library that it has called) must ask the Linux kernel to do something on its behalf. This is done by means of "a system call," and the exact manner in which this is done varies by CPU-type.

So this explains my intense confusion, thanks for letting me know that's I guess that why its hard to give a precise answer on exactly what happens. Kinda like how its hard to come up with a standard format label for comparing CPU speeds...

jlliagre · 10-30-2016, 04:48 PM

Quote:

Originally Posted by linux4evr5581

@jlliagre- Thanks for clearing that up, I guess an example would be how the command ls works by havings it's opcodes stored in /bin/ls

Yes, but beware not oversimplifying.

/bin/ls contains much more than opcodes which are just a part of the binary code. /bin/ls contains sections, at least one of these sections contain binary code which is composed of opcodes plus their operands, a.k.a. machine language. Not all of the code used by /bin/ls is contained in this sections. This extra code is located in similar sections from other files which are shared libraries. Most or all of the libraries required by /bin/ls are listed in one of these sections. Other sections contain information used by debuggers, static data (e.g. strings used by the program), and other pieces of information. Part of the extra code is also contained in the kernel binary. This is the one executed when a system call is in progress.

Quote:

and having the behavior of this program optionally modified through configuration files, which I guess which modify the operands..

The behavior of /bin/ls is essentially modified by the user running it. For example the current directory where you "cd" before launching "ls" will usually matter, and the arguments you might have passed to "ls" through the shell, e.g. "ls *.txt" will also matter.

linux4evr5581 · 10-30-2016, 06:32 PM

@jlliagre Got it sweet ok thanks once again for explaining that, im sure that will save me some headache when I begin working with all this. You been a great help man, all of you thank you!

jlliagre · 10-31-2016, 03:00 AM

Thanks, glad to know it helped.

Let me restate a point about system calls that is often misunderstood.

Despite popular belief, when a process perform a system call, it doesn't pass the control to a "super process" which handles all system calls named "the kernel" and wait for the reply, just like say a browser do a request to a web server which handles all other requests from other browsers.

No, when a process perform a system call, it just temporarily changes its mode and got extra privileges for this period of time but it is still running in the very same thread of execution.

jpollard · 10-31-2016, 07:03 AM

Quote:

Originally Posted by jlliagre

Thanks, glad to know it helped.

Let me restate a point about system calls that is often misunderstood.

Despite popular belief, when a process perform a system call, it doesn't pass the control to a "super process" which handles all system calls named "the kernel" and wait for the reply, just like say a browser do a request to a web server which handles all other requests from other browsers.

No, when a process perform a system call, it just temporarily changes its mode and got extra privileges for this period of time but it is still running in the very same thread of execution.

That is true for Linux, but not all operating systems. And even in Linux it isn't 100%. For these the transfer to the kernel will cause the process to be suspended, and a different thread will take over.

The places this is true in Linux will be the information retrieval functions - retrieving time, process states... But in others the process is suspended while kernel tasks are carried out in a different thread, but possibly using the memory context of the user process. Network activity for one. The thread of execution will place requests for action in a queue, then be suspended until that action is complete. The kernel will catch an interrupt, the independent thread will be resumed to handle it - and it will then process the queued request.

Most physical I/O is handled this way, loading buffers, writing buffers... Then when the request is in memory the requesting thread will be resumed, and then complete the request and return to the user mode program.

Determining what is a "thread" of execution gets vague...

You can see these other "processes" that handle such, though they aren't easily identified as to what they are doing... They show up in the output of "ps -elf" as the processes within brackets - usually named "[kworker/nn:nn]" with various strings for nnn or "[kthreadd]" (and not limited to these names either).

linux4evr5581 · 10-31-2016, 10:43 AM

Quote:

Originally Posted by jpollard

That is true for Linux, but not all operating systems. And even in Linux it isn't 100%. For these the transfer to the kernel will cause the process to be suspended, and a different thread will take over.

The places this is true in Linux will be the information retrieval functions - retrieving time, process states... But in others the process is suspended while kernel tasks are carried out in a different thread, but possibly using the memory context of the user process. Network activity for one. The thread of execution will place requests for action in a queue, then be suspended until that action is complete. The kernel will catch an interrupt, the independent thread will be resumed to handle it - and it will then process the queued request.

Most physical I/O is handled this way, loading buffers, writing buffers... Then when the request is in memory the requesting thread will be resumed, and then complete the request and return to the user mode program.

Determining what is a "thread" of execution gets vague...

You can see these other "processes" that handle such, though they aren't easily identified as to what they are doing... They show up in the output of "ps -elf" as the processes within brackets - usually named "[kworker/nn:nn]" with various strings for nnn or "[kthreadd]" (and not limited to these names either).

Interesting the latter sounds a like state-present synchronous sequential system where its using states of memory elements (the buffers, or memory context of the user), so that the system can remember what happens before values change/changed (and produce an output based on that state or inputs/state). And where the outputs will change only through some sort of signal, such as in this case the kernel catching the interrupt I think.. (not sure though I need to finish my online circuitry class)

jpollard · 10-31-2016, 12:21 PM

Well, the queued entries identify the process, and that can be used for some actions.

But usually the queued values are for data exchange (read/write/update...), in which case the only significant thing is to transfer the data buffers to/from external devices (or systems). This is still being done for the users process, but it is various kernel thereads handling the operation. They don't directly interact with the user unless there is some signal to be passed (from errors), but at that point the kernel thread simply registers the problem and the user process is re-entered in the run queue - (some other processor may pick it up) and the kernel thread goes to process the next queue entry.

The signal only gets delivered to the user process after the process execution thread is run on some CPU.