how linking in linux works

juanbobo · 06-08-2005, 04:06 PM

Could someone explain how linking works in Linux? I am not asking how to create or use shared libraries, I am wondering how/when exactly the libraries are loaded and called.

bigrigdriver · 06-08-2005, 08:42 PM

I'm not exactly certain I understand what you are asking for. If I understand correctly, you want to know when/how a library file is called?

Look through your systems files. Look for files with the .h extension (header files). Near the top of such files, you will see lines with #include <somefile.h>. That (at least in the C programming language) is the call to the compiler to "include" this file from some library when the program is compiled. In other words, that part is written into the compiled application, and only that part of the library. There is no call, per se, to the library. The included part is in the binary compiled application.

btmiller · 06-08-2005, 09:23 PM

Not exactly -- the header file just specifies the interface to a particular file. The actual code of the library routines is in a library file (static, generally ending in .a followed optionally by a version number or dynamic ending in so with an optional version number). Most executables are linked dynamically, i.e. the code of the library routines is not inserted directly into the executable, as this would make the executable file very big and inefficient in other respects. Instead, the dynamic linker in Linux will load the library code from the appropriate library file if needed and insert it into the process address space at runtime (thus the system need only keep one copy of the library in memory in most cases).

Or do you mean links in the file system? If so, I wrote a long post about the internals of file system links about a month or two ago, which should be found in the search.

juanbobo · 06-08-2005, 09:48 PM

Thanks guys, I should have been more specific. I had to check an old book on protected mode to remind me about descriptor tables and whatnot. I guess to know the workings any deeper I'll have to look at the linker code.

foo_bar_foo · 06-09-2005, 05:37 PM

the linker puts only the name of the shared library in the executable code.
unless you use prelink i think which alters the executable to point to a preseeked table for the lib and its dependancies.
shared library is actually one single position independent object file and when the executable comes across it it includes (mapps) all of the code in the library and its dependancies not just what it needs.
then the various functions in the shared library may be mapped to different virtual addresses for each program (process) using that function and when and how all that actually gets into RAM is just kernel MM junk

basically all ld does is look for and find the library and all of the libraries dependancies i think

juanbobo · 06-09-2005, 10:08 PM

Thank you very much foo_bar_foo, you answered my question.

ravi · 02-16-2006, 04:20 AM

Hi people,
I am trying to play with shared libraries. I have written a shared library(libgcall) and I link it with another program written by me. In one module of the shared library I am trying to get the stack address as below:

Code:

int get_the_stack_addr()
{
	int *ptr,var;
	ptr = &var;
	printf("%p",ptr);
	/* Both ptr and var will be on the stack and ptr will contain address of var i.e. address of 		   stack
	*/
	return (1);
}

Now if I know it right, the shared library will share the same stack as that of the test progam. But the results are different. I get a stack address 0x4212ee20. The test program stack address is 0x8048918.

Now is the interesting part. If I declare any variable like I did below:

Code:

int get_the_stack_addr()
{
	int *ptr,var;
	int a;
	ptr = &var;
	printf("%p",ptr);
	/* Both ptr and var will be on the stack and ptr will contain address of var i.e. address of 		   stack
	*/
	return (1);
}

I get the same address of the test program stack(0x80489180).
There seems to be no logic or reason to it. Can anyone help?
Thanks in advance

XavierP · 02-16-2006, 12:54 PM

Moved: This thread is more suitable in Programming and has been moved accordingly to help your thread/question get the exposure it deserves.

aluser · 02-17-2006, 11:36 AM

A function in a shared library uses the same stack as the function which calls it.

Code:

/* gcall.c */
#include "gcall.h"
#include <stdio.h>

int get_the_stack_addr(void)
{
	int *ptr, var;
	ptr = &var;
	printf("%p\n", ptr);
	return 1;
}

Code:

/* testlib.c */
#include "gcall.h"
#include <stdio.h>

int main()
{
	int a;
	printf("main sees stack at %p\n", &a);
	get_the_stack_addr();
	return 0;
}

Code:

12:10 aluser@alf:~/test/c/libstack$ make testlib
gcc -Wall  -c testlib.c
gcc -Wall  -fPIC -shared -c gcall.c
gcc -Wall  -shared -o libgcall.so gcall.o
gcc -Wall  -L. -Wl,-rpath,"`pwd`" -o testlib testlib.o -lgcall
12:10 aluser@alf:~/test/c/libstack$ ./testlib
main sees stack at 0xbfcce604
0xbfcce5cc
12:10 aluser@alf:~/test/c/libstack$ ./testlib
main sees stack at 0xbfdae2f4
0xbfdae2bc
12:10 aluser@alf:~/test/c/libstack$ ./testlib
main sees stack at 0xbfcf4ba4
0xbfcf4b6c

Each time the program runs, the stack starts at a slightly different place, but get_the_stack_addr() always sees a stack address just below that which main sees, as you'd expect for a downward growing stack.

The different stack starting points occur even for static executables; I suspect that my system is doing this in order to make certain exploits harder to run. Actually I'm waving my hands a bit here; it seems that it's actually the OS which sets up the stack pointer at the beginning, since I can eliminate _start() and still see this variation:

Code:

/* esp.s */
.data
someint:
        .int 4
.text
.global _start
_start:
        movl %esp, someint
        movl $4, %eax
        movl $1, %ebx
        movl $someint, %ecx
        movl $4, %edx
        int $0x80
        movl $1, %eax
        movl $0, %ebx
        int $0x80

This moves %esp into someint, then calls the write() syscall on &someint. You can get human readable output by piping through xxd (remember that it's little endian, so a05b96bf means bf965ba0):

Code:

12:32 aluser@alf:~/test/asm$ gcc -c esp.s
12:32 aluser@alf:~/test/asm$ ld -o esp esp.o
12:32 aluser@alf:~/test/asm$ ./esp|xxd -g4
0000000: a05b96bf                             .[..
12:32 aluser@alf:~/test/asm$ ./esp|xxd -g4
0000000: 406ba4bf                             @k..
12:32 aluser@alf:~/test/asm$ ./esp|xxd -g4
0000000: d02fd4bf                             ./..

So yeah.. conclusion is that linux varies the starting position of the stack, but that a function being in a shared library or not doesn't change what stack it uses. (The only "normal" way I know for a program to run functions on a different stack is inside signal handlers)

wow that was long.

aluser · 02-17-2006, 03:09 PM

To partially answer the OP, I too would like to have a decent source on linking in linux. What I've been able to gather for shared libraries is that your program gets an executable .plt section which contains stubs for all the shared lib functions you call; each jumps to an address which is stored in a writable secion called .got.plt (got stands for global offset table). It looks like there may be an additional indirection too, which has something to do with the fact that the offset table isn't necessarily loaded with all the shared lib addresses to begin with; sometimes it has to be loaded on demand (?)

The linker must do some of this at runtime, perhaps before _start() is even called (?)

I worked out some of this with objdump; see the -d, -s, and -j options. I googled for some of the symbols I came across with that. There must be some better way to get this information : )

johnMG · 02-17-2006, 11:14 PM

http://wiki.linuxquestions.org/wiki/...ands_and_Files