basic question about dynamic linking

lenrekxunil · 02-08-2010, 12:51 AM

I have read a couple of articles on how dynamic linking works (those stuff about got, plt and lazy binding), and I am still not sure why you need to do dynamic linking in such a complicated way.

Suppose your program uses a function in a shared library that needs to be linked dynamically at run time (like a printf). Why can't you statically decide the virtual address of the function at compile time? After all, all you need to do is to enter the page table entry corresponding to the address of the function if the library has been already loaded to a physical page frame.

Thanks.

business_kid · 02-08-2010, 03:44 AM

Basically, because nobody knows _exactly_ what goes on in a pc any more. Static addresses play into the hands of hackers, so there is PIC (Position Independent Code) and PIE (Position Independent Executables). Position Independent code can get sliced up and parceled about. Even if you have a non-pic program, the kernel may offset things if it is a PIE. If everything is set at a low memory address, well that area gets very crowded (run ps -e if you don't believe me) and if it's high, why, the box may not even have that much memory

Doing it your way, a hacker could insert a call to a known subroutine in a known library at a known address and play havoc with your system. It's the equivalent of putting more tools in his toolbox

lenrekxunil · 02-08-2010, 10:14 AM

I am not sure if I was able to get my point across, so I will use a concrete example to explain my question. I just want to know the reason for the way it is done now, and whether there are any alternatives to this approach. I would appreciate it if you could correct my misunderstandings, if there are any.

Suppose I have a program that calls two library functions, libfunc1 and libfunc2, that are in libraries lib1 and lib2 respectively. libfunc1 and libfunc2 are located at offset 500 and 1000 respectively. Then the compiler may generate an executable and a table like the ones I show below (VA stands for virtual address):

### executable
VA 30000: main() starts here
...
call 10500 (VA of libfunc1, which is VA of lib1 + 500)
...
call 21000 (VA of libfunc2, which is VA of lib2 + 1000)

### table
lib1: 10000
lib2: 20000

At run time, the OS will fill the entries in the page table that translates the VA of the starting addresses of the libraries to the physical addresses.

### page table
VA PA
10000 15000
20000 500

My questions are:
1. Would this scheme work?
2. What are the possible merits or shortcomings of this approach? A few that I can come up with are:
(merits)
- Faster access, since there is no indirect calling or reference to library functions or variables.
(demerits)
- Possible fragmentation, since the libraries need to start at the page boundaries. For example, the example above will work if the page size were say 5000, but will fail if it were 7000.
- As explained in the previous post, this scheme may not be secure as doing dynamic linking.

business_kid · 02-09-2010, 06:34 AM

You're over my head at this point. I'm no programmer. The place you will find out about that is the LKML archives. There's hundreds of long emails per day but you will probably find this discussed by the guys who set it up.
<linux-kernel@vger.kernel.org> IIRC. I had to join once, and deleting the email about the 2038 date problem took most of the day :-/.