Number of Virtual Memory Layer
I heard that Linux can be configured to use multi-layer virtual memory from 1 to 4. Does anyone know a particular distribution where 4-layer virtual memory is used? Or is it common to use full four layers?
I only know that a specific mobile device uses 3-layer virtual memory in Android Kernel. |
Depends on the processor. Linux normally uses 3 layers... with a fourth value being the offset into the selected page.
https://www.kernel.org/doc/gorman/ht...rstand006.html |
Might the word here be: 'level' (vs, 'layer')? (I'm a very-not-sure-Newbie here!)
I found references to PML4, like [2004 2.6! 64bit change]: http://lwn.net/Articles/106177 I wonder IF your "3 ... Android" refers to ['older'] 32bit/PAE (LONG story here) [?] |
Perhaps this would be a better reference: http://rayseyfarth.com/asm/pdf/ch04-memory-mapping.pdf
The fourth layer reference there is to the process page table (an index). |
In all virtual-memory architectures, the virtual address is split into several different fields, accessing a "tree" of related data structures that are maintained by the kernel. Any of these structures may be marked "missing," triggering a page fault (or a memory protection exception). This software interrupt stops the process from executing and transfers control to the operating system. When the operating system resolves the issue and transfers control back to user-land, the interrupted process re-tries (or, resumes) the instruction.
To save time, processors also use some kind of "translation lookaside buffer (TLB)" (the old IBM mainframe term, which stuck ...) to instantly resolve recently-used virtual addresses without looking them up in the virtual-memory tables. Privileged instructions are used to invalidate all or part of the TLB entries when the underlying tables are changed. A TLB is a parallel memory caching circuit, built into the CPU chip: every "bucket" in the cache is literally checked at the same instant. |
Think of the page tables as a radix tree search. The TLB is a cache of recent entries containing only unique entries.
The page tables and the TLB are searched in parallel - and whichever one identifies the entry first is used. A TLB hit terminates the page table search (which is MUCH slower). But that means that the TLB isn't necessarily consistent with the page tables when the page tables change - thus the TLB has to be "invalidated" by the OS to force the correct values to be identified from the page tables. At that point, the TLB gets the new entry. Since the TLB has a limited number of entries, a new one will replace an invalidated entry, or the least recently used entry. |
Thank you for providing useful resources and comments.
To Jjanel, Yes you are right. What I meant was level. And the document in mm.txt file is nice -- I haven't known x86_64 itself has 4-level structure (I tweaked ARM before). To jpollard, Thanks! So three levels are common. That slides are clearly explained, thanks. And to sundialsvcs, Wow, TLB was a term coined by IBM? I learned about TLB in my OS class (though I'm not aware that TLB itself can be multi-level like the slides jpollard posted). Thank you all. |
I think that "TLB = Translation Lookaside Buffer" is an original-IBM term. (I still find it in my original "POP = Principles of Operation" manual which was ... koff koff ... printed on a line printer.)
... if you ... ("bah! humbug! these kids today!" ... ;)) even are aware of what "a line printer" even is ... ("koff, koff™ ...") "But, I Digress.™" :D TLBs are not "multi-level!"! By design, they are parallel! Why? "Because they specifically exist to allow the CPU to avoid(!) a multi-level search!" :eek: Here's the TLB's objective: "Here's the Question. Do you have (in one clock-cycle) the Answer?" In all hardware architectures (however big or small ...), you will find this very-key component. It consists of a certain number of elements of so-called associative memory. In one clock-cycle, the virtual-address of interest is presented to all of the memory-elements at once:
|
Most TLB misses are handled by the hardware by continuing the page table walk. It isn't an OS issue as that would be uselessly slow (I have used one that did that... the PDP-10, for all intents the associative memory had to be big enough to map the entire memory... or your speed was about that of a PDP-11/34)
ref: https://en.wikipedia.org/wiki/Transl...okaside_buffer |
Quote:
Ahh, yes ... the PDP. :) |
Not just four memory reads... it could involve another page fault where the page tables themselves may be paged out. Doesn't happen often, but for large memory machines it could.
|
All times are GMT -5. The time now is 04:55 AM. |