Libraries: what they are, how they work, why you need them
All serious software nowadays does most of its work using libraries. The top-level program code is mainly a caller for library functions written by other people, and it is these that actually do most of the heavy lifting. Without libraries, programming would be impossible for anyone but professionally trained coders.
A library is a collection of functions that do a particular set of interrelated jobs: reading and writing a particular media format, doing quad-precision maths, parsing xml, handling traffic through usb ports, encrypting and decrypting web traffic, and so on. Once a library has been written, it can (in theory) be used by any subsequent program which requires this particular job to be done.
This is very convenient for programmers. For example, if you want a program to be able to handle files written in a particular format, you do not need to understand the format; you just learn how to use the corresponding library. A graphical program can use libpng to read png graphics, libjpeg or libjpeg-turbo to read jpg graphics and so on. Similarly any program that wants to handle encryption and decryption can use openssl or gnutls. And any graphical program can use a widget library like gtk2 for its display functions.
Problems inevitably arise in the world of proprietary software, where libraries belong to particular companies, and their use by programs developed by a different company would be a breach of copyright. For this reason, proprietary software typically comes packaged with its own libraries (in Windows they carry the file suffix .dll for "dynamic link library"). Since new code tends to be buggy code, having to create a whole new library to get around a copyright restriction is not a recipe for stable software.
In Linux, the libraries, like the programs, are free software. This means that once a library has been written, any programmer is allowed to use it. Every Linux library comes with a programmer's manual listing the functions in the library, their syntax, what they do, and how to use them correctly. This increases the stability of Linux because all the standard libraries have been around for long enough to have had most of the bugs knocked out of them.
The first libraries were simply archives of compiled functions. Such libraries still exist. When programs are built, the library calls are replaced by copying the corresponding block of binary code from the archive, a process known as "static linking". In Linux, static libraries are created by a program called ar, which is part of the binutils package. Another program called ranlib creates an index for the archive.
Static linking has some advantages: it makes programs more self-contained so that they can run on any system, whether the libraries themselves are present or not, and they are less likely to be corrupted by software updates that crash. However they have one glaring disadvantage: when a library is updated, the programs built against it are stuck with the old versions of the functions unless they are recompiled. And to recompile them, you first need to identify them. Since the library functions are seamlessly built into the program, there is no way to tell which programs have been built against a particular library unless a separate record is kept.
Nowadays static linking is mainly used for libraries that are part of a program package and will not be installed separately. Such a package might contain several individual programs that all use the same functions; putting the common code in a library saves duplication and any discrepancies resulting from it. The functions are statically linked in during the build, but only the programs themselves are actually installed.
For libraries that are to be used more generally, dynamic linking is usually better. It makes for smaller programs and easier updating. In this type of linking, library calls are replaced at build time, not by actual code but by a pointer to the relevant code in the library. This code is then "pulled through" at run time. Since the code is read out rather than copied, any addresses that it contains need to be independent of its position in the actual program. Fortunately, the gcc compiler can create such "position-independent code" if requested to.
Dynamic linking is carried out at build time by another binutils program called ld (actually ld will do static linking too if asked, but dynamic linking is the default behaviour). The actual insertion of the linked function at run time is done by a part of libc called ld-linux.so (for 32-bit code) or ld-linux-x86-64.so (for 64-bit code).
Dynamic linking to libraries allows updating on the fly. If the library is replaced by a newer version which is still compatible with the old programming interface, running programs can use the new version just as readily. All affected programs which are launched after the update will automatically link to the new version. But what happens to programs that are running at the time of the update?
In Windows, you need to reboot your computer after a software update, precisely because libraries that are in use cannot be replaced by new versions until the programs using them have shut down. However in Linux, files are deleted in a two-stage process:
1) the filename is removed from the directory and the link counter in the file's inode is decremented
2) the file's inode and data blocks are marked for recycling, provided that the link counter is now zero.
If the file is a library which has a dynamic link to a running program, the second condition will not be met, so the file continues to exist for as long as it is being used. Only when it is no longer linked to any program will it finally be deleted. However it no longer has an entry in any directory, so newly launched programs cannot access it; they will link instead to the new version.
To make all this seamless, Linux uses a series of symbolic links. Each library is represented by three files: libfoo.so.x.y.z (where "x.y.z" is the version number), libfoo.so.x, and libfoo.so. Two of these are actually symbolic links:
libfoo.so -> libfoo.so.x -> libfoo.so.x.y.z
Note the .so suffix, the equivalent of Windows .dll. It stands for "system object" and always indicates a dynamic library. Libraries for static linking carry the .a suffix.
libfoo.so.x.y.z is the file that actually contains the library code. This is the file that will be replaced at each update. The link libfoo.so.x is called the soname: it changes only if there is a major version change (defined as a change to the programming interface that requires dependent programs to be recompiled). libfoo.so is the file that ld links the program to. This means that the actual library file can be replaced without disturbing the linkage.
Traditionally the most essential libraries (including the whole of libc) are stored in /lib, and the less essential, more application-oriented ones in /usr/lib. However the situation has been greatly complicated by the switch from 32-bit to 64-bit architecture. Many distros now have library directories with names like /lib64 or /usr/lib32, and some of these may be symbolic links to other directories. In Crux, for example, the two lib64 directories are links to the corresponding lib ones, and there are separate lib32 directories for the 32-bit libraries that some software requires. But most distros still keep libraries that must be available at all times out of the /usr tree.
Libraries can be kept in other places too, for example in /usr/local/lib. Just as there is a standard command path where the shell looks for programs, so there is a standard library path where ld and similar commands look for libraries. The file /etc/ld.so.conf can be used to tell linking utilities about permanent extra library directories. Or the environmental variable LD_LIBRARY_PATH can be used to add a temporary location.
The "dark side" of dynamic linking is dependency. Because dynamically linking libraries are required at runtime, they must be permanently installed on your system in order for programs that require them to run at all. But modern package managers can usually deal with such dependencies seamlessly, installing automatically those libraries that an application needs along with the application itself. The one essential requirement is that each library be compatible with all the programs that are going to use it. Ensuring that this is so is mainly the responsibility of the distro developers, but users can do their bit by not installing alternative versions of libraries from non-standard repositories.
A library is a collection of functions that do a particular set of interrelated jobs: reading and writing a particular media format, doing quad-precision maths, parsing xml, handling traffic through usb ports, encrypting and decrypting web traffic, and so on. Once a library has been written, it can (in theory) be used by any subsequent program which requires this particular job to be done.
This is very convenient for programmers. For example, if you want a program to be able to handle files written in a particular format, you do not need to understand the format; you just learn how to use the corresponding library. A graphical program can use libpng to read png graphics, libjpeg or libjpeg-turbo to read jpg graphics and so on. Similarly any program that wants to handle encryption and decryption can use openssl or gnutls. And any graphical program can use a widget library like gtk2 for its display functions.
Problems inevitably arise in the world of proprietary software, where libraries belong to particular companies, and their use by programs developed by a different company would be a breach of copyright. For this reason, proprietary software typically comes packaged with its own libraries (in Windows they carry the file suffix .dll for "dynamic link library"). Since new code tends to be buggy code, having to create a whole new library to get around a copyright restriction is not a recipe for stable software.
In Linux, the libraries, like the programs, are free software. This means that once a library has been written, any programmer is allowed to use it. Every Linux library comes with a programmer's manual listing the functions in the library, their syntax, what they do, and how to use them correctly. This increases the stability of Linux because all the standard libraries have been around for long enough to have had most of the bugs knocked out of them.
The first libraries were simply archives of compiled functions. Such libraries still exist. When programs are built, the library calls are replaced by copying the corresponding block of binary code from the archive, a process known as "static linking". In Linux, static libraries are created by a program called ar, which is part of the binutils package. Another program called ranlib creates an index for the archive.
Static linking has some advantages: it makes programs more self-contained so that they can run on any system, whether the libraries themselves are present or not, and they are less likely to be corrupted by software updates that crash. However they have one glaring disadvantage: when a library is updated, the programs built against it are stuck with the old versions of the functions unless they are recompiled. And to recompile them, you first need to identify them. Since the library functions are seamlessly built into the program, there is no way to tell which programs have been built against a particular library unless a separate record is kept.
Nowadays static linking is mainly used for libraries that are part of a program package and will not be installed separately. Such a package might contain several individual programs that all use the same functions; putting the common code in a library saves duplication and any discrepancies resulting from it. The functions are statically linked in during the build, but only the programs themselves are actually installed.
For libraries that are to be used more generally, dynamic linking is usually better. It makes for smaller programs and easier updating. In this type of linking, library calls are replaced at build time, not by actual code but by a pointer to the relevant code in the library. This code is then "pulled through" at run time. Since the code is read out rather than copied, any addresses that it contains need to be independent of its position in the actual program. Fortunately, the gcc compiler can create such "position-independent code" if requested to.
Dynamic linking is carried out at build time by another binutils program called ld (actually ld will do static linking too if asked, but dynamic linking is the default behaviour). The actual insertion of the linked function at run time is done by a part of libc called ld-linux.so (for 32-bit code) or ld-linux-x86-64.so (for 64-bit code).
Dynamic linking to libraries allows updating on the fly. If the library is replaced by a newer version which is still compatible with the old programming interface, running programs can use the new version just as readily. All affected programs which are launched after the update will automatically link to the new version. But what happens to programs that are running at the time of the update?
In Windows, you need to reboot your computer after a software update, precisely because libraries that are in use cannot be replaced by new versions until the programs using them have shut down. However in Linux, files are deleted in a two-stage process:
1) the filename is removed from the directory and the link counter in the file's inode is decremented
2) the file's inode and data blocks are marked for recycling, provided that the link counter is now zero.
If the file is a library which has a dynamic link to a running program, the second condition will not be met, so the file continues to exist for as long as it is being used. Only when it is no longer linked to any program will it finally be deleted. However it no longer has an entry in any directory, so newly launched programs cannot access it; they will link instead to the new version.
To make all this seamless, Linux uses a series of symbolic links. Each library is represented by three files: libfoo.so.x.y.z (where "x.y.z" is the version number), libfoo.so.x, and libfoo.so. Two of these are actually symbolic links:
libfoo.so -> libfoo.so.x -> libfoo.so.x.y.z
Note the .so suffix, the equivalent of Windows .dll. It stands for "system object" and always indicates a dynamic library. Libraries for static linking carry the .a suffix.
libfoo.so.x.y.z is the file that actually contains the library code. This is the file that will be replaced at each update. The link libfoo.so.x is called the soname: it changes only if there is a major version change (defined as a change to the programming interface that requires dependent programs to be recompiled). libfoo.so is the file that ld links the program to. This means that the actual library file can be replaced without disturbing the linkage.
Traditionally the most essential libraries (including the whole of libc) are stored in /lib, and the less essential, more application-oriented ones in /usr/lib. However the situation has been greatly complicated by the switch from 32-bit to 64-bit architecture. Many distros now have library directories with names like /lib64 or /usr/lib32, and some of these may be symbolic links to other directories. In Crux, for example, the two lib64 directories are links to the corresponding lib ones, and there are separate lib32 directories for the 32-bit libraries that some software requires. But most distros still keep libraries that must be available at all times out of the /usr tree.
Libraries can be kept in other places too, for example in /usr/local/lib. Just as there is a standard command path where the shell looks for programs, so there is a standard library path where ld and similar commands look for libraries. The file /etc/ld.so.conf can be used to tell linking utilities about permanent extra library directories. Or the environmental variable LD_LIBRARY_PATH can be used to add a temporary location.
The "dark side" of dynamic linking is dependency. Because dynamically linking libraries are required at runtime, they must be permanently installed on your system in order for programs that require them to run at all. But modern package managers can usually deal with such dependencies seamlessly, installing automatically those libraries that an application needs along with the application itself. The one essential requirement is that each library be compatible with all the programs that are going to use it. Ensuring that this is so is mainly the responsibility of the distro developers, but users can do their bit by not installing alternative versions of libraries from non-standard repositories.
Total Comments 1
Comments
-
Hazel,
Thank you for the article on libraries. Very informative and detailed.
greencedarPosted 05-19-2019 at 04:57 AM by greencedar