Everything's a file

Posted 10-09-2020 at 11:31 AM by hazel
Updated 03-28-2021 at 02:57 AM by hazel

In most operating systems, a file is simply and solely a named block of data stored in a particular area or areas of a disk drive, which can be retrieved and optionally modified at will. In Unix systems like Linux, quite a few other things besides stored data masquerade as files. Hence the old joke that in Unix, everything's a file. It's not quite true but there is some truth in it. Things that aren't files but behave as if they were make Unix systems much simpler internally than many other OS's. Programs can interact with all these entities in the same way, and some very useful features of inter-program communication like pipes emerge quite naturally from the way that Unix kernels handle files.

Let's start with real files on disk. In Linux, a disk file consists of an inode and one or more blocks. The blocks contain the data, the information in the file. The inode contains the metadata, information about the file: who owns it, its access permissions, when it was created, and so on. The inode also contains pointers, direct or indirect, to all the data blocks, so once the kernel has the inode, it effectively has the whole file.

The kernel stores a copy of the inode in its current inode pool and reads at least some of the blocks into a buffer for convenience. Any subsequent requests for data by a program can then be satisfied at electronic speeds. Similarly data written out by a program can be stored in a buffer and written to disk when the system is idle.

For non-Unix filesystems, a filesystem driver generates a synthetic inode for each file that needs to be accessed, using information stored on the filesystem or made up from the mount parameters.That allows the main read/write system in the kernel to be independent of filesystem type.

This can be pictured as:
disk -> kernel input buffer -> program -> kernel output buffer -> disk.

Pipes
Now suppose two programs want to pass data between them. It could be done by one program writing the data into a temporary file, which the other program then reads. But this would lead to all kinds of race conditions. In Linux, there is a much simpler solution:
program 1 -> kernel buffer -> program 2.

Programs normally output to and input from kernel buffers, so it is a fairly simple matter to short-circuit the process. This kind of arrangement is called a pipe.

Making a pipe from within a program is easy. You just use the pipe() function. It's a system call, meaning that the kernel actually does the work. It creates a C-shaped pipe or buffer linking the two lowest available file descriptors. If the process then forks, you get an X-shaped pipe with four ends. Finally one process closes its read end and the other its write end to give a simple one-way communication channel between the two.

A pipe can link sibling processes as well as parent and child, though this is a bit more complicated. Unix shells use the pipe symbol | to request a link betwen two children, as in ls | more. The shell creates an internal pipe from standard output (fd1) to standard input (fd0), then forks off two children and closes both of its own pipe ends. The ls child closes off its read end and the more child closes its write end. Et voila: a simple pipe passing the standard output of ls to the standard input of the more pager.

Named pipes
To link more distantly related programs, Linux provides named pipes or FIFO's (the name comes from First In First Out, which is the only way a pipe can be read). You create them with the mkfifo command. FIFO's have filenames and are indexed in directories just like any real file. They even have an inode just like a real file. But there are no blocks on disk because a named pipe contains no permanent data. Instead, anything written into the FIFO goes into a kernel buffer from which the program listening at the other end can read it, just as with any other pipe.

If your Linux distro uses sysvinit, you will find in the /dev directory a FIFO called initctl. The init program listens at the other end of this pipe. If you do a long listing, you will see that the FIFO belongs to root and only the owner has read or write access to it. If you call shutdown as root, it will write into this pipe and init will respond by shutting down or rebooting the system.

Sockets
A pipe can link only two programs and it only transmits data in one direction. For the conversation between a server and its multiple clients, you need a different mechanism called a socket. This is like the wall socket that you plug your landline telephone into. If I also have a telephone plugged into my wall socket, I can ring your number and we can have a conversation. The telephone number serves as an address. In the same way, the socket on which a server listens has an address where clients can contact it.

Two different systems have grown up for specifying socket addresses. Unix sockets look like FIFO's. Their address is a filename stored inside a directory. Like a FIFO, a Unix socket has an inode but no blocks because it is linked only to a buffer inside the kernel. This type of socket can service only local clients. An Internet socket has an address consisting of an IP address plus a port number and it can service both local and remote clients. In all cases, the actual transfer is via a kernel buffer, just like with a real file.

Devices
The kernel wears two hats. It manages all the hardware and all the processes. We've already seen how it links a process running a program to data files on disk and, by extension, can link processes to one another. It can also link processes to any other kind of hardware that they need to access, for example a usb port that is hosting a mouse, or a sound card element. This is done via the device directory.

In a modern Linux system, the device files in this directory are created by the udev program as the kernel detects the actual devices at startup. Like other "special files" they have an inode but no blocks. Instead, where the block addresses would be, the inode contains a major and a minor device node. The major device node tells the kernel which driver to pass the request to, and the minor device node tells the driver which of its devices to access if there is more than one.

The kernel maintains for each device a block of function pointers passed to it by the driver for carrying out functions equivalent to the various possible file operations: open, close, read, write, seek, tell, append, rewind. Not all of these will be appropriate. The last four are only relevant for block devices such as disk partitions. Where there is no corresponding function, a NULL pointer is used.

What this means in practice is that a program can access a device file exactly as it would any other kind of file and the access will trigger an appropriate driver operation (or an error message saying that the operation isn't possible for this device). The program doesn't need to know that it is writing to or reading from the kernel, just as it doesn't know this in the case of a pipe or a socket, or indeed the read/write buffer that services a disk. Everything's a file.

Posted in Linux kernel

Views 792 Comments 1

« Prev Main Next »

Total Comments 1

Comments

Hi Hazel,

I enjoyed reading this.

Thanks!

Posted 03-27-2021 at 03:06 PM by firefli