LinuxQuestions.org - [SOLVED] Absolute Adresses in compiled program

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Absolute Adresses in compiled program (https://www.linuxquestions.org/questions/linux-newbie-8/absolute-adresses-in-compiled-program-4175704830/)

Absolute Adresses in compiled program

I have a question thats been bothering me for a while.

We have been taught at uni that after a compilation ( so even .o file) there is
assembly instructions for certain processor.
Also: the adresses of the instructions should be absolute? I am not talking about dynamic libraries or stub as those are inserted during program loading or during execution.
But how come the instruction adresses can be absolute? What happens if on that adress there is a different code? I feel like all programs should be PID :D

It would be of great help if some of you would answer my question:)

The addresses referred to are the location within the compiled code of functions, etc. In other words once the program is loaded into memory those specific instructions are at a fixed offset from the beginning of the code. How else do you think the program would be able to access its own instructions and know where they were located?
What do you think would happen if the code were to be relocated to a different address while the app was running?

The details depend on the CPU architecture. All architectures can specify absolute addresses. Most can specify addresses relative to a base address (offsets may be limited to 8, 16, or 32-bits). The linker and loader can relocate absolute addresses as needed to make an executable run at a specific address.
Ed

As the above answers indicate, in user-space all absolute addresses are relative ... :p

Virtual-this, virtual-that ... enough to make your head spin.

yes, every process has its own assigned memory and the address (absolute or relative) is valid only inside.
https://www.kernel.org/doc/html/late.../mm/index.html

First of all. Thank you all for such a quick answer!

Ok, so the adresses are virtual so it is actually relative.

So whats the difference between PIC program and normally compiled code?
Because the way you guys have answered seems like every program is PIC

The addresses created by the compiler are virtual. These are then translated on a page by page basis into physical memory addresses. There are translation tables inside the kernel that do the page mapping. But as far as the program code is concerned, the virtual addresses are perfectly real.

PIC, as used by libraries, is different in that these addresses are relative even by the program's own standards. It must be so because a particular library function could be called several times by different parts of the program (or indeed by different programs) and has to run the same way every time.

aaaaaah, okey.

One more thing :D .

so lets say one program's code virt. addr. is mapped to phys. addr. and then to memory.

But if I try to load another program with similar(or same) virt. address as the first one then the mapping tables should see these virtual addresses are already mapped right? So there might be some kind of collision. Or do the mapping tables have some workaround when 2 processes have the same virt memory?

Quote:

Originally Posted by smegmaLord (Post 6308871)

so lets say one program's code virt. addr. is mapped to phys. addr. and then to memory.

But if I try to load another program with similar(or same) virt. address as the first one then the mapping tables should see these virtual addresses are already mapped right?

Exactly! If a page is already mapped (say because it's a function in a common library and another program is running it concurrently), then the virtual page will be mapped directly to that physical page. So one physical page might correspond to several virtual pages in different programs.

Quote:

So there might be some kind of collision. Or do the mapping tables have some workaround when 2 processes have the same virt memory?

For every type of page, there are kernel functions that handle "page faults", such as not finding the virtual page in the translation table. If the page is code, the handler function for such a fault would be to check whether the content of the page is already in memory as part of another program. If so, there's no need to copy it a second time. A translation table entry can be created for the virtual page pointing to the existing physical memory page.

Ok!

But that means that the 2 codes will have the same virtual memory if and only if the code is the same( or it is some library function) but how would compiler know such information?

In general, shared objects get loaded at different virtual addresses because the loader does not control a processes' virtual address map. The goal is for the kernel to map all instances of a shared object to the same physical pages so that only one copy resides in memory. To do this, the shared object should be compiled with -fPIC. The compiler emits extra instructions (if needed) to make the code run at any virtual address.
Ed

Ok. Thank you very much!

Unfortunately, this discussion wandered almost immediately into error, apparently based on the ambiguous term, "absolute address."

When the compiler generates an ".o" file, that file contains not only binary instructions, but relocation information that will be needed to adjust memory locations within the loaded program image in order to account for where in memory it was loaded. The file also contains a list of symbols defined by this file, and a list of symbols that are needed. The so-called "loader" is responsible for bringing all of the required materials into memory, thus deciding their address within the [virtual ...] memory space, and patching-up all the memory addresses everywhere.

It is possible for a symbol to be defined as equal to a fixed address-value, as opposed to being relative to the program unit in which it is defined. This facility is very rarely used.

However, nothing allows you to know anything about "absolute" as in "within physical CPU RAM." All programs run in a virtual memory space and perceive "memory" to be that space ... which it is, for them. Meanwhile, the virtual memory subsystem maps a portion of this (the portion that is currently being used) to physical RAM addresses – which are subject to change at any time. Only the operating system kernel knows what this mapping is at any particular microsecond.

In fact, the virtual memory address that you next reference might not be in physical RAM at all. This will produce a "page fault" interrupt. The operating system will then retrieve the "page" from backing storage (disk) where it had previously saved it, and it is free to put it anywhere in physical memory – any so-called "page frame."

Ah cool.
Thank you!

And how does PIC play a role in this?

PIC means Position Independent Code, with other words it is relocatable.
In general it is explained here: https://en.wikipedia.org/wiki/Position-independent_code
In sort: non-relocatable code have a fixed address inside the memory of the process.