Linux Internals Questions

BryceCovert · 06-15-2006, 11:02 AM

Hello everyone.

My name is Bryce. I am a software developer, and as time goes on I find myself enjoying GNU/Linux more and more, and find myself moving towards Linux. As a developer I'm naturally intriqued by the internals of GNU/Linux, and I have a few questions.

1) Is ld.so.2 the dynamic linker? If I understand correctly, this works similarly to how Windows automatically detects exported methods from DLLs in the same directory and system32? This makes it possible for executables not to be statically linked to glibc, correct? Is there any material or reading on how the dynamic linker works?

2) I'm a little confused about how the kernel handles memory allocation. Is the allocation code done in the kernel or glibc? malloc is located in glibc, but memory is handled from the kernel. How do these two interact?

3) How does the kernel know which init script to run? If I understand correctly, there are more than one init scripts available (systemvinit, initng, etc). Is this set when compiling the kernel or after? I've heard that "linux is only a kernel" several times. So it seems odd that somehow the kernel would end up reading some config file to know which process to start. Does my question make sense? It seems like if linux is only a kernel, it would somehow be independent of the filesystem. It would seem logical that Grub or Lilo would first initialize the kernel, then run the init script, as opposed to the kernel running the script itself. Does that make sense?

4) What's the point in separating out /usr/bin, /usr/local/bin, and /usr/share/bin? The only address I've seen to this is in a UNIX book, and it mentioned that the division between these is somewhat blurry.

5) I understand that the different distros have developers and customize the GNU software to work with their version of GNU/Linux. Let me use a simple example to try to ask my question:
Let's take a simple package- tar. Now lets say that RedHat never liked the format of the tar output. Let's say they wrote some code to make the verbose output different. Now the GNU developers make a new versions of tar. When the RedHat decides that the new version should be implemented in the next version of RHEL, do they modify the new version's source to change the verbose output again? It seems like the developers would constantly be copying the source from the GNU mirrors, patching it to work the way the specific distro prefers it, and then including it in their distro. This can get very repetitive and/or challenging, especially when a major version change totally changes the inner workings of the software, and the developers have to find a new way to implement their features. Are the distro developers reimplementing the wheel with each new version of the original developer's package?

It's early in the morning, and I'm sure my questions aren't written the best way, but I'm just dying for my curiosity to be fulfilled.

Thanks,
Bryce

osor · 06-15-2006, 12:39 PM

Quote:

Originally Posted by BryceCovert

Hello everyone.

My name is Bryce. I am a software developer, and as time goes on I find myself enjoying GNU/Linux more and more, and find myself moving towards Linux. As a developer I'm naturally intriqued by the internals of GNU/Linux, and I have a few questions.

Hi. Welcome to LQ!

Quote:

Originally Posted by BryceCovert

1) Is ld.so.2 the dynamic linker? If I understand correctly, this works similarly to how Windows automatically detects exported methods from DLLs in the same directory and system32? This makes it possible for executables not to be statically linked to glibc, correct? Is there any material or reading on how the dynamic linker works?

I am not sure how detailed you want. A pretty brief explanation is given by `man ld.so' and `man ld.so.conf' and `man ldconfig' and `man ldd' (note man allows one to read manual pages associated with command). A very thorough explanation would involve looking at the ELF standard.

Quote:

Originally Posted by BryceCovert

2) I'm a little confused about how the kernel handles memory allocation. Is the allocation code done in the kernel or glibc? malloc is located in glibc, but memory is handled from the kernel. How do these two interact?

It's not like you think it is. Linux is designed to have POSIX-compatible programming environment. You almost never have to directly call kernel methods in normal programs. You will understand more once you start using it (POSIX programming).

Quote:

Originally Posted by BryceCovert

3) How does the kernel know which init script to run? If I understand correctly, there are more than one init scripts available (systemvinit, initng, etc). Is this set when compiling the kernel or after? I've heard that "linux is only a kernel" several times. So it seems odd that somehow the kernel would end up reading some config file to know which process to start. Does my question make sense? It seems like if linux is only a kernel, it would somehow be independent of the filesystem. It would seem logical that Grub or Lilo would first initialize the kernel, then run the init script, as opposed to the kernel running the script itself. Does that make sense?

The kernel, by default, will run the executable /sbin/init from the root partition. This can be a symlink to a different program if you want (usually it's just plain sysvinit). The kernel, like normal programs, can accept command-line parameters at boot time (since 1.3.73). You can read how to do this with GRUB or LILO or whatever you want to use. Basically, you can say something like (this is GRUB):

Code:

kernel = /boot/vmlinuz init=/sbin/initng

for the kernel to load a different program at startup. There is also a way to modify the kernel image itself and change the default init. Also, the kernel doesn't run any scripts. It runs the init program which, in turn, runs the scripts.

Quote:

Originally Posted by BryceCovert

4) What's the point in separating out /usr/bin, /usr/local/bin, and /usr/share/bin? The only address I've seen to this is in a UNIX book, and it mentioned that the division between these is somewhat blurry.

It is blurry (especially if you are heavily customizing your distro). By convention, /usr/bin is supposed to contain those programs that are part of the distribution (including .deb or .rmp or equivalent packages from the distro website or repository). The directory /usr/local/bin contains those programs that came from a third party (usually compiled from source). AFAIK, there is no /usr/share/bin. The above tradition goes beyond just bin. The third-party counterpart to /usr/lib is /usr/local/lib, the counterpart to /usr/share is /usr/local/share, etc.

Quote:

Originally Posted by BryceCovert

5) I understand that the different distros have developers and customize the GNU software to work with their version of GNU/Linux. Let me use a simple example to try to ask my question:
Let's take a simple package- tar. Now lets say that RedHat never liked the format of the tar output. Let's say they wrote some code to make the verbose output different. Now the GNU developers make a new versions of tar. When the RedHat decides that the new version should be implemented in the next version of RHEL, do they modify the new version's source to change the verbose output again? It seems like the developers would constantly be copying the source from the GNU mirrors, patching it to work the way the specific distro prefers it, and then including it in their distro. This can get very repetitive and/or challenging, especially when a major version change totally changes the inner workings of the software, and the developers have to find a new way to implement their features. Are the distro developers reimplementing the wheel with each new version of the original developer's package?

First off, your example is not a good one, since `tar' is a standard (although there are GNU extensions). The other thing is that this is not how a distribution works. If someone doesn't like how a project is going, they can fork and start their own. Forked projects often have the problem you suggested (where code is kind of still shared, but goes in separate directions). The only patches a distribution will make to any piece of source code is either to make it interoperate better with their setup, or to customize the user interface slightly. An example of the former is to change a hardcoded directory "/etc/syscofig/network" to one that fits their way of doing things: "/etc/sysconfig/networking". An example of the latter is when you type in `programname --help' and at the bottom, it says "Please send bug reports to bugs@debian.org" instead of "Please send bug reports to bugs@programmname.org". As you can see, there is very little chance of the help text of the program undergoing `major code cleanup'

. For most distros, you can see the patches they make. You will notice they are small like this.

Quote:

Originally Posted by BryceCovert

It's early in the morning, and I'm sure my questions aren't written the best way, but I'm just dying for my curiosity to be fulfilled.

Thanks,
Bryce

We are here to answer questions.

BryceCovert · 06-15-2006, 02:37 PM

Thanks for the helpful info! I guess I will follow some of your advice and ask more questions as they come up. Thanks for clarifying. So most of the time when a distro wants to change something entirely, such as the UI on a program, they fork the project? And this can result in issues like I had mentioned? Sorry that it wasn't a good example- I was just trying to simplify my explanation.

Thanks for the help.

Bryce

BryceCovert · 06-15-2006, 02:53 PM

One more question: How are the kernel and glibc related then? I understand that user programs do not call kernel methods, but does glibc? If glibc doesn't, what does? Naturally, I'm not familiar with POSIX programming, so this is confusing to me.

Bryce

osor · 06-15-2006, 06:13 PM

First off, distributions don't fork projects. It's not their job. If any person (or group of people) decide they don't like some of the methods in which a project manager is using in his program, they fork and create a new project. The job of a distribution is not to make opensource projects. Their goal is to package opensource projects into a user-friendly installable packages. The programming done by distributions mainly consists of

writing efficient shell scripts that help to `glue' the system together in a coherent way.
writing software that indexes dependencies of a large number of packages
writing package managers
writing efficient install scripts that give users a good first impression (this applies mostly to thos distros who want to be "user-friendly")
integrating extra or customized ``fluff'' into certain packages (for example, the bug-report email in gcc or the default background in X, etc.).

The first and the last of these is probably what you were referring to. The thing is, that the customization is (almost) never at a deep level, and patches can often survive between even major version changes without much effort.

Now a person who works at/on a distribution certainly may fork a project s/he doesn't like, but that does not mean the distribution does that. Also, a distribution may favor or even sponsor one project over another. But what they don't do (at least most of them I know) is stop you from choosing whatever you want to use. It's not like, "Just because our distro vice-president John Smith forked project X into project Y, we aren't going to package project X anymore (those who need it are going to have to compile from source)." This doesn't mean that a distribution will have packages for every (OSL'ed) project out there (but the absence of a package is not (usually) for political reasons as alluded to in the previous sentence).

This may not apply to all distros and with all packages (just wait for all the LQer's to point out every instance

)

Glibc uses system calls to do stuff. It also uses itself (i.e., one function calls a different function --- the C library is complex). Most of the calls are accessed through #includeing system-specific, yet system independent header files such as <asm/pgalloc.h>, <asm/page.h>, <asm/ioctl.h>, <asm/fnctl.h>, etc. (what I mean is that even though you know that asm/page.h will have different implementation on an x86 vs. a mips machine, you can be pretty sure that any functions that you were going to call would be defined on both systems).

So the kernel makes available many low-level (and I mean low) things accessible to userspace programs. It's just that (for portability and ease of maintenence) the only ones should use them are either system programs (lspci?) or things like glibc (libraries that are very integral to the functioning of the system).

Another thing to think about: glibc is `just a C library' (this is not a trivial thing, but bear with me). What I mean is that although the GNU C library is very compatible with linux, and although (almost) every distro out there ships with glibc as the C library, glibc != linux. They are designed to be distinctly independent, yet optimized to be together. GNU's C library is supposed to be at least somewhat independent of machine platform and underlying kernel. That's why there are those references to asm/*.h instead of linux/*.h (it's not that the latter don't exist, it's that the library should be able to compile if they weren't detected). There are other C libraries that can be used with linux. Klibc is probably not used very often. Tinylibc is supposed to be small (in size). uClibc was originally designed to be the C library for uClinux (a fork, or rather patchset, of the linux kernel to run on platforms missing an MMU), but is now almost fully compliant (math support sucks), very small, and may be used as a `drop-in replacement' for glibc (but it is only for linux systems, unlike glibc).

Something else that you might think about is LFS (Linux From Scratch). This is intended as an academic exercise (but can be used as a functioning distribution). Basically, it involves compiling every program from scratch. Things like startup scripts are provided, but minimal and customizable. It is a wonderful learning experience, and will teach you about how a distribution is made (after which, it becomes so much easier to understand things like linux internals). I must warn you, though, this requires much time and motivation. It may or may not work flawlessly the first time (and may take anywhere from about a couple of days to a couple of weeks). I strongly suggest that you use the latest stable verison of the book (at this time, I think it's 6.1.1) TO THE LETTER (don't get any tarballs with higher version numbers, don't try to be too smart, etc.) in order to have a good first experience. Experimentation is best left for the second or third (or fourth

) time around.

Also, here is the modern counterpart of the POSIX standard. In one section, it gives you the manpages of all commands and functions. You can use this to tell what is actually part of the standard, and what is either an extension or part of a different standard (for example, sockets, though found on every modern unix, are not part of POSIX). It also gives you an overview of how a unix should behave (use more as a reference than a guide).

BryceCovert · 06-15-2006, 09:09 PM

I highly appreciate the detailed information. I've looked at linux from scratch before, but yet to take the time to do it. Thanks for the help.