Protection of Process Address Space

entz · 06-02-2011, 08:02 PM

hello,

i'm doing a lot of reading regarding operating systems design , computer architecture as well as the linux kernel (i'm reading "Understanding the Linux Kernel" among other books)

anyways , i'm contemplating different OS design both segmentation-based and page-based .

i can't really get my head around the issue of Memory/Process Protection.

i gathered that there are 4 different privilege levels in the x86 arch which means that for example code segments in ring 3 (user processes) can't jump to kernel code in ring 0 segments...etc

but what about protecting tasks in ring 3 from each other ?
what stops a malicious code from doing a far jump or call into the code segment of another ring 3 process , or changing its SS or DS to access other processes data?

i think that the TR register (which points to the current Task State Segment) and the LDTR (Local Descriptor Table) are somehow involved in mapping and enforcing the address spaces but i'm not really sure..

i'd welcome some more insights...

NOTE:
this is a general thought about Address Protection concerning any OS running on the X86 arch and is NOT limited to Linux in particular

kind regards

sundialsvcs · 06-03-2011, 08:01 AM

Virtual memory is a velvet-lined comfy container: a very fine place to live, but enclosed within an inescapable box.

Only operating-system code, executing in ring-0, can issue the so-called "privileged" instructions which are necessary to alter the behavior of the virtual memory subsystem. User level code can neither alter the virtual memory settings, nor turn it off. Each process has its own set of page and segment tables which determine both what it is allowed to see and where in memory any particular piece of data appears to be. It can't touch those tables; it can't even see them. Thus, one process cannot "jump into the code of another" because it cannot see it. (However, "shared segments" can be mapped into the address-space of more than one process at a time ... indeed, this is how "shared segments" work.)

The only way that a process can get to data is for that data to be mapped into its own address space ... which it cannot do directly.

The only way that a process can get to code that is not presently mapped into its own address space (e.g. make an operating system request) is through a call gate that has been set up for it by the operating system (or, MS-DOS style, by issuing a software INTerrupt). It cannot control, nor even see, where that request will "go to." The CPU's privilege level (ring number) may change at that time, but the process has no knowledge nor influence upon it.

entz · 06-04-2011, 07:06 PM

hmmm , thanx for the input

regards

entz · 06-05-2011, 07:23 AM

actually i've another question that just occured to me.

if every process address is mapped into its own memory address space through segments , pages or both together , there remains one crucial final thing to consider and that is the , how to determine how many segments or pages the process can access.

i'll give an example , assuming that segmentation is used ,the OS sets the GDTR and the LDTR prior to handling control over to the user level process , we know that each segment selector is 13 bits which means 2^13 = 8192 selectable segments , if each segment descriptor is 8 bytes long that would be 64KIB the size of the whole GDT or LDT.

but what if the OS just wants to assign 3 segments to the process? is there a register that sets the "limit" of the number of entries in either the GDT or LDT that the process is allowed to map?
it would appear rather ludacris to reserve 64KIB for each process and i don't think that system designers would have tolerated that waste in the days of the 640KB memory limit.

so the same question applies for paging , if the OS can set the cr3 register prior to the process context switch , how can the OS set the boundaries for each page directory table? (part from the page frame because they are fixed a priori by the architecture to either 4kb or 2mb or whatever..)

cheers

sundialsvcs · 06-06-2011, 03:19 PM

I'll point you in the general direction of Intel's internal architecture references for the final answers on this one.

When an application issues a request to retrieve data at a particular address, the processor knows (a) what the address is, and (b) what segment-register (CS/DS/ES/etc.) it was using. The particular settings of the CPU's privileged "control registers," and the values found within the tables that they point to (said tables being accessible to, and adjustable by, only the operating system ...) will determine what happens next.

The operating system is, of course, required to "set things up properly," and it always does.

The CPU will use one portion of the address to retrieve the proper segment-table entry, and another portion to retrieve a page-table entry, and, if all goes perfectly well, it will proceed to retrieve the requested data.

If anything is wrong, it will issue a General Protection Fault (GPF).

When this occurs, your program is interrupted and control is passed to Linux's interrupt handler, which gets to decide what to do. It might "resolve" the interrupt and instruct the CPU to re-try the instruction, in which case your program will never realize that anything extraordinary happened. Or, it might decide to send a signal to your process, so that your process will either deal with the situation or (more likely) die.

johnsfine · 06-06-2011, 04:47 PM

Quote:

Originally Posted by entz

if every process address is mapped into its own memory address space through segments , pages or both together

That is done with pages. There is a whole lot less interesting happening with segments than any of your questions so far consider.

You can have very close to a complete understanding of the Linux virtual memory system without even knowing segments exist.

Quote:

how to determine how many segments or pages the process can access.

The architecture allows an exact setting of the number of segments. You could look up the details if you like. But that is all an obscure side issue to the virtual memory system, so you shouldn't bother.

Pages are organized hierarchically, so there can be (and typically are) large gaps all over the address space. It is not one linear table mapping the beginning of the address space and not mapping the end.

Each page and each page table is 4096 bytes long. In 32 bit non PAE, a page table maps 1024 pages, and each page directory maps 1024 page tables.

So a process needs a whole page directory (4096 bytes) no matter how little address space it maps. That divides the address space into 1024 chunks of 4Mib each. Each chunk independently might be unmapped (no page table and no pages) or mapped to a page table or mapped to a single 4MiB big page.

Each page table similarly divides 4MiB of address space into 1024 chunks of 4096 bytes each. Each chunk independently might be mapped to a page or not.

32 bit PAE is similar, except each page table maps only 512 pages and only covers 2MiB of address space. Each page directory covers only 1GiB and maps only 512 2MiB chunks (independently each unmapped or page table or 2MiB big page). A higher level directory divides the 4GiB address space into four 1GiB chunks each of which is either mapped (needs a page directory) or unmapped.

x86_64 is just like PAE up to the point that I marked in red. But then instead of a four entry table, it has two more layers of 512 entry tables.

Quote:

the OS sets the GDTR and the LDTR prior to handling control over to the user level process , we know that each segment selector is 13 bits which means 2^13 = 8192 selectable segments , if each segment descriptor is 8 bytes long that would be 64KIB the size of the whole GDT or LDT.

No. Those tables have settable length. They are only as long as the OS makes them.

Quote:

so the same question applies for paging

But with a different answer. Because the structure is hierarchical and whole tables can be unmapped in any position in the hierarchy, the waste of not having settable lengths is minor.

Quote:

how can the OS set the boundaries for each page directory table?

It can't. Every table at every level of the hierarchy is 4096 bytes long, except for the top level of 32 bit PAE, which is 32 bytes long. None of those lengths are programmable.

In 32 bit, the OS could mix segmentation and paging in such a way that part of the top level page directory would never get used, so that memory would be free for some other use. But that is too ugly a kludge for a reasonable OS to use.

Quote:

(part from the page frame because they are fixed a priori by the architecture to either 4kb or 2mb or whatever..)

Big pages are fixed system wide to either 2MiB or 4MiB. Small pages are 4KiB across the whole architecture. But big pages may be mixed with small pages in the same process address space. I think some CPU models also support 1GiB pages in 64 bit mode (freely mixed with 2MiB and 4KiB pages). But I don't know if Linux has any support for 1GiB pages.

Quote:

Originally Posted by sundialsvcs

the processor knows (a) what the address is, and (b) what segment-register (CS/DS/ES/etc.) it was using.
...
The CPU will use one portion of the address to retrieve the proper segment-table entry, and another portion to retrieve a page-table entry

The part I marked in red is especially misleading.

A copy of the segment-table entry is kept in a hidden portion of the segment register (cs, ds, es, etc.) and used based on which segment register is implicitly or explicitly used by the reference. The entry is retrieved from the actual segment table only when loading a value into a segment register. That is not generally described as a "portion of the address" (especially not in a "flat" address space OS such as Linux or recent versions of Windows).

A portion of the address is compared (simultaneously) against several TLB registers to find the one that can translate the logical address to a physical address. If that fails, the hierarchical page structure described above is used to load the correct entry into a TLB register to finish the translation.