Originally Posted by dspjm
what is the use of setting the last 1GB virtual address of a process to belong to kernel if the process it self cannot use that memory?
When a hardware interrupt occurs or a user mode call for a kernel operation, the CPU changes its hardware protection level so that the kernel mode mappings become accessible. But the CPU does not switch to a different mapping table (which is a much more time consuming switch).
So the kernel code that executes those interrupt and request service routines must occupy virtual address space in each process.
An OS could be designed (like that build time option in RHEL 4) to have only a tiny part of the kernel mapped that way. Before servicing any non trivial request from a user mode process, it would need to switch the mapping to the kernel process containing the code and kernel data needed for processing that request. Then when accessing the user mode parameters, buffers and other data associated with that request, it would need to create temporary mappings in the kernel process to access parts of the address space of the user process (at a different virtual address than the original location of that data, because the kernel might not be able to spare the virtual address corresponding to the user data).
All that is possible, but complicated. It is much simpler for the kernel virtual addresses to be disjoint from the user virtual addresses, so kernel code servicing a user mode request needs only to have the overall mapping of that process active in order to access any user mode data at the same virtual address where the user mode process had that data.
BTW, the kernel also has occasional need to access user mode buffers when the correct process is not currently selected for the overall mapping. Early X86-32 Linux and current X86-64 Linux made (make) the assumption that total physical memory is significantly smaller than kernel virtual memory. That allows all of physical memory to be mapped into kernel virtual memory all the time. A virtual address in any process can be translated to a physical address, then the fixed offset added to translate to a kernel virtual address. But X86-32 Linux with over 832MB of physical ram, can't use that method and needs a slower and more complicated solution for those instances.
Originally Posted by dspjm
I thought that vmalloc was to allocate a virtually continuous memory for kernel, so I think that kernel has its own address table and can have more than 1GB memory.
The address table is hierarchical, so duplicating the kernel mappings into every process address space does not require duplicating all the detail of those mappings. With PAE, I think only one pointer needs to be duplicated for a quarter of the address space. Without PAE, I think 256 pointers have to be duplicated for a quarter of the address space.
All the detail of the kernel mappings is in a portion of the address table hierarchy that is shared by all processes. So it is possible to do one vmalloc in the kernel to create a kernel mapping that will be present in every process's address space.
But in the common X86-32 Linux set up, you cannot do anything really big with vmalloc, because the whole kernel portion of the address space is 1GB.