Why does it say that an userspace address can point to kernel memory

dspjm · 11-09-2012, 11:52 AM

On ldd3rd, it says the access_ok(int type, const void *addr, unsigned long size
) function checks whether the address of addr points to kernel memory.
I thought that each process including kernel have a distinct memory address table, which means the same value of address means different things to kernel and userspace process, so why do we bother to check it?
Can anybody correct my point?

syg00 · 11-09-2012, 06:33 PM

The kernel memory addresses are common. To all address spaces. They cannot be referenced in user context.
Non-kernel (virtual) addresses are translated by address space specific page tables to real addresses when referenced. Code running in a non-user context (kernel or interrupt) must not reference user-space directly.

dspjm · 11-09-2012, 11:39 PM

Quote:

Originally Posted by syg00

The kernel memory addresses are common. To all address spaces. They cannot be referenced in user context.
Non-kernel (virtual) addresses are translated by address space specific page tables to real addresses when referenced. Code running in a non-user context (kernel or interrupt) must not reference user-space directly.

I see that, I wonder if there is an virtual address value region which can only be used by kernel, e.g. 0x00-0x10, while remaining region can be used by anyone.

NevemTeve · 11-10-2012, 11:22 AM

I think in x86 kernel address space begins at 0xc0000000.
PS: did some search and found sg to read: http://www.makelinux.net/ldd3/chp-15-sect-1

johnsfine · 11-10-2012, 12:07 PM

Quote:

Originally Posted by dspjm

I thought that each process including kernel have a distinct memory address table

The Linux kernel is not at all like a process (unlike some more obscure OS's in which the kernel is very much like a process).

There was a kernel build time option in Linux (RHEL 4) that caused the kernel to be much more like a process, including having its own address table. That option was an ugly and inefficient kludge, but was necessary for very large 32-bit servers. I doubt that it is still available even as a build time option (don't know for sure). The purpose for that option is completely obsolete. There is no reason to try to run that large a server without using a 64-bit CPU and kernel.

Quote:

Originally Posted by dspjm

I see that, I wonder if there is an virtual address value region which can only be used by kernel, e.g. 0x00-0x10, while remaining region can be used by anyone.

In 32-bit X86 Linux, the kernel region of the virtual address space defaults (at kernel build time) to being the last 1GB of the 4GB address space. It can be set to a different size if you build a custom kernel.

In X86-64 architecture, the intent of the CPU designers was that the kernel of any OS would reserve the second half of the (very large) address space. That split is only encouraged, not strictly forced, by the CPU design, so it would be possible to customize Linux with a different split. But it would be messy and pointless. Half of the address space is plenty for any 64-bit process.

dspjm · 11-10-2012, 11:41 PM

Quote:

Originally Posted by johnsfine

The Linux kernel is not at all like a process (unlike some more obscure OS's in which the kernel is very much like a process).

There was a kernel build time option in Linux (RHEL 4) that caused the kernel to be much more like a process, including having its own address table. That option was an ugly and inefficient kludge, but was necessary for very large 32-bit servers. I doubt that it is still available even as a build time option (don't know for sure). The purpose for that option is completely obsolete. There is no reason to try to run that large a server without using a 64-bit CPU and kernel.

In 32-bit X86 Linux, the kernel region of the virtual address space defaults (at kernel build time) to being the last 1GB of the 4GB address space. It can be set to a different size if you build a custom kernel.

In X86-64 architecture, the intent of the CPU designers was that the kernel of any OS would reserve the second half of the (very large) address space. That split is only encouraged, not strictly forced, by the CPU design, so it would be possible to customize Linux with a different split. But it would be messy and pointless. Half of the address space is plenty for any 64-bit process.

Thanks for answering.
So can we see it this way that the last 1 GB virtual address of every process refers to kernel. Kernel cannot have more than 1GB memory, and don't have a virtual address table.
I still have some doubts, I thought that vmalloc was to allocate a virtually continuous memory for kernel, so I think that kernel has its own address table and can have more than 1GB memory.
And what is the use of setting the last 1GB virtual address of a process to belong to kernel if the process it self cannot use that memory?
Thanks

johnsfine · 11-11-2012, 09:31 AM

Quote:

Originally Posted by dspjm

what is the use of setting the last 1GB virtual address of a process to belong to kernel if the process it self cannot use that memory?

When a hardware interrupt occurs or a user mode call for a kernel operation, the CPU changes its hardware protection level so that the kernel mode mappings become accessible. But the CPU does not switch to a different mapping table (which is a much more time consuming switch).

So the kernel code that executes those interrupt and request service routines must occupy virtual address space in each process.

An OS could be designed (like that build time option in RHEL 4) to have only a tiny part of the kernel mapped that way. Before servicing any non trivial request from a user mode process, it would need to switch the mapping to the kernel process containing the code and kernel data needed for processing that request. Then when accessing the user mode parameters, buffers and other data associated with that request, it would need to create temporary mappings in the kernel process to access parts of the address space of the user process (at a different virtual address than the original location of that data, because the kernel might not be able to spare the virtual address corresponding to the user data).

All that is possible, but complicated. It is much simpler for the kernel virtual addresses to be disjoint from the user virtual addresses, so kernel code servicing a user mode request needs only to have the overall mapping of that process active in order to access any user mode data at the same virtual address where the user mode process had that data.

BTW, the kernel also has occasional need to access user mode buffers when the correct process is not currently selected for the overall mapping. Early X86-32 Linux and current X86-64 Linux made (make) the assumption that total physical memory is significantly smaller than kernel virtual memory. That allows all of physical memory to be mapped into kernel virtual memory all the time. A virtual address in any process can be translated to a physical address, then the fixed offset added to translate to a kernel virtual address. But X86-32 Linux with over 832MB of physical ram, can't use that method and needs a slower and more complicated solution for those instances.

Quote:

Originally Posted by dspjm

I thought that vmalloc was to allocate a virtually continuous memory for kernel, so I think that kernel has its own address table and can have more than 1GB memory.

The address table is hierarchical, so duplicating the kernel mappings into every process address space does not require duplicating all the detail of those mappings. With PAE, I think only one pointer needs to be duplicated for a quarter of the address space. Without PAE, I think 256 pointers have to be duplicated for a quarter of the address space.

All the detail of the kernel mappings is in a portion of the address table hierarchy that is shared by all processes. So it is possible to do one vmalloc in the kernel to create a kernel mapping that will be present in every process's address space.

But in the common X86-32 Linux set up, you cannot do anything really big with vmalloc, because the whole kernel portion of the address space is 1GB.