Quote:
Originally Posted by Joy Stick
1) Raw devices are bypassing OS layer
|
From your usage, yes.
Quote:
2) Why cooked file system IO is slower than raw device ?
-
- Description - What do you mean by description data
|
A UNIX (or most operating systems) filesystem has multiple users that are not necessarily allowed access to some data. To implement that requires the users to go through a "cooked" filesystem.
The UNIX filesystem is divided into parts:
1. a superblock. This block identifies the filesystem type, and locates where a file identification list is, and how large it is (In UNIX and unix like systems this is an inode table), it also identifies the blocks that are available for use (the free list; in UNIX and unix like systems this is an inode table). An inode contains information about who owns the file, what data blocks belong to the file (or if the file is really small, the file data itself). The list of data blocks used in the file can be organized in different ways. The classic method is that the inode contains the first 10 (or so) block numbers that contain the beginning of the file. The last block number is used to locate a block that contains an additional list of blocks. This block is the "first level indirect" because the list of block numbers are for blocks that contain additional block numbers (that hold the data). The last block number entry is for yet another block. This block is the "second level indirect", because the block identifies contains a list of block numbers that list more blocks that are also lists of blocks... And it goes three levels deep.
This tree has to be managed - every block of data written has to be accounted for, when the file is deleted, the data blocks have to be tracked... and any/all indirect blocks have to be tracked.
To allocate a block for either data or pointer blocks requires additional management (at its simplest, the list of free blocks is a bitmap - one bit for every block. If the block is allocated the bit is set to 1, if it is available it is set to 0.
Creating the filesystem requires initializing the inode list, and marking the bits in the free list that are used by the inode table. This prevents an inode block from being used for data (which would then lose the data that used to be there).
The inode list is numbered from 0 to however many files can be created (again, this varies as different filesystems have different ways to specify the inode list). The classic implementation has inode 0 being identified as a directory. The data within the directory is a list of inode number, file name pairs. The inode 0 directory has a first entry "0,." where the "." is the file name. This represent the root file system. To maintain a tree, the second entry in the inode 0 directory fiel is "0,..", the .. is the name of the parent directory. Because this is the first directory in the filesystem (the "root" directory), both names identify the same inode.
Any additional files created within this directory will be assigned an inode, and the name will be added to the directory. Thus if a file "data" is being created - the entry would have "1,data", because the data file would get a free inode (and the next one would be inode 1).
This is only a brief (and necessarily incomplete) introduction.
For a user to access the data, first the filesystem has to be "mounted". Mounting requires the filesystem superblock to be read, and the information in it gets associated with a mountpoint. During the boot process access to the system tools requires the filesystem holding those tools to be mounted. The kernel achives this by initializing a memory block that contains the necessary pointers to the inode 0 block (it actually copies it into memory so that it doesn't have to keep re-reading the block every time it gets referenced). Now when a command is given, the kernel takes the path specified by the system call and matches it against a mounted list. Assuming the command is "/bin/ls", the first "/" character causes the kernel to search the directory of the first mounted filesystem for the name "bin". That requires the filesystem to read the first data block of the directory. If the name isn't found there, it must read more blocks (these data blocks are specified by the pointer blocks in the inode, or the indirect pointer blocks, or the third level pointer blocks). Thus a read for one block of data gets translated into reading two or more blocks... When the name "bin" is found, the kernel now has the inode number associated with the directory bin. And the process repeats with that inode until it finds the file "ls".
Now the inode number of "ls" is found, another part of the kernel determines if it is executable (specified by flags in the inode header), and will (either directly or indirectly) start reading the data blocks... To read data blocks into memory requires the kernel to allocate memory for it. Once all the data blocks associated with the "ls" file are in memory the kernel can then create a process (allocating CPU time...) that will give the CPU access to the memory (and which memory blocks).. to execute the program.
There are entire text books on how all the parts of the kernel interact to accomplish shared access to resources.
This has only been a VERY SHALLOW exposure of only a few of the procedures involved.