The Ext3 file system was designed to provide higher availability without impacting the robustness (at least the simplicity and reliability) of Ext2. Ext3 is a minimal extension to Ext2 to add support for journaling. Ext3 uses the same disk layout and data structures as Ext2, and it's forward- and backward-compatible with Ext2. Migration from Ext2 to Ext3 (and vice versa) is quite easy, and can even be done in-place in the same partition. The other three journaling file systems required the partition to be formatted with their mkfs utility.
If you want to adopt a journaling file system, but don't have free partitions on your system, Ext3 could be the journaling file system to use. See "Switching to Ext3" for information on how to switch to Ext3 on your Linux machine.
Switching to Ext3
If you want to switch to Ext3, it's a good idea to make a backup of your file systems. Once you've done that, run the tune2fs program with the -j option to add a journal file to an existing Ext2 file system. You can run tune2fs on a mounted or unmounted Ext2 file system. For instance, if /dev/hdb3 is an Ext2 file system, the command
# tune2fs -j /dev/hdb3
creates the log. If the file system is mounted, a journal file named .journal will be placed in the root directory of the file system. If the file system is not mounted, the journal file will be hidden. (When you mount an Ext3 file system, the .journal file will appear. The .journal file is just an indicator to show that the file system is indeed Ext3.)
Next, the entry for /dev/hdb in /etc/fstab needs to be changed from ext2 to ext3. The final step is to reboot and verify that the /dev/hdb3 partition has type ext3. Type mount. The output should include an entry like this one:
/dev/hdb3 on /test type ext3 (rw)
Ext3 provides three data journaling modes that can be set at mount time: data=journal, data=writeback, and data=ordered. The data=journal mode provides both meta-data and data journaling. data=writeback mode provides only meta-data journaling. data=ordered mode, which is the default mode, provides meta-data journaling with increased integrity. With three modes, a system administrator can make a trade off between performance and file data consistency.
If for some reason you'd like to change the Ext3 partition back to Ext2, the process is very simple: umount the file system, and re-mount it using Ext2.
# mount -t ext2 /dev/hdb3 /test
If you want the file system to mount as Ext2 at boot time, you'll also have to change its entry in etc/fstab.
The downside of Ext3? It's an add-on to Ext2, so it still has the same limitations that Ext2 has. The fixed internal structures of Ext2 are simply too small (too few bits) to capture large file sizes, extremely large partition sizes, and enormous numbers of files in a single directory. Moreover, the bookkeeping techniques of Ext2, such as its linked-list directory implementation, do not scale well to large file systems (there is an upper limit of 32,768 subdirectories in a single directory, and a "soft" upper limit of 10,000-15,000 files in a single directory.) To make radical improvements to Ext2, you'd have to make radical changes. Radical change was not the intent of Ext3.
However, newer file systems do not have to be backward-compatible with Ext2. ReiserFS, XFS, and JFS offer scalability, high-performance, very large file systems, and of course, journaling. "Why Four Journaling File Systems is a Good Thing" presents an overview of the capabilities of the four journaling file systems.
Why Four Journaling File Systems is Good
One of the great things about open source is that choice is looked upon favorably. Linux is the only operating system with four journaling file systems in production: ReiserFS, Ext3, JFS, and XFS.
All four file systems have the GPL license, and source code is available at http://www.kernel.org
or on each project's home page. Each of the journaling file system teams follow a community model and welcome users and contributors. In fact, the teams share their best ideas, and competitive benchmarking encourages constant improvement of all of the systems.
The table below summarizes the features and limits of the four Linux journaling file systems. The first section provides some history of when the journaling file system were accepted into the kernel.org source trees. The next section, lists some of the features of the file systems. The final section, lists some of the distributions that are currently shipping the journaling file systems. If the distribution is shipping the file system that you want to use, you can use that file system right "out-of-the-box."
For complete feature lists of each journaling file system, see the respective project Web pages.
A comparison of journaling file systems
Kernel support Ext3 ReiserFS XFS JFS
Kernel prerequisites No No Yes No
In kernel.org source tree 2.4.Ix 2.4.15 2.4.1 - -
In kernel.org source tree 2.5.Ix 2.5.0 2.5.0 - 2.5.6
License GPL GPL GPL GPL
Largest block size supported on ia32 4 Kb 4 Kb 4 Kb 4 Kb
File system size maximum 16384 GB 17592 GB 18,000 PB+ 32 PB
File size maximum 2048 GB 1 EB* 9,000 PB 4 PB
Growing the file system size Patch Yes Yes Yes
Access Control Lists Patch No Yes WIP
Dynamic disk inode allocation No Yes Yes Yes
Data logging Yes No No No
Place log on an external device Yes Yes Yes Yes
Distros with journaling file systems
Red Hat 7.3 Yes Yes No Yes
SuSE 8.0 Yes Yes Yes Yes
Mandrake Linux 8.2 Yes Yes Yes Yes
Slackware Linux 8.1 Yes Yes Yes Yes
+ Pb is petabyte, or 1015 bytes
* Eb is exabyte or 1018 bytes
By the way, the 2.4 kernel has a limit of 2048 Gb for a single block device, so no file system larger than that can be created at this time (without patching the standard kernel). This restriction could be removed in the 2.5.x development kernel, and there are patches available to remove this limit, but as of 2.5.29, the patches haven't been officially included yet.
ReiserFS is designed and developed by Hans Reiser and his team of developers at Namesys. Like the other journaling file systems, it's open source, is available in most Linux distributions, and supports meta-data journaling.
One of the unique advantages of ReiserFS is support for small files -- lots and lots of small files. Reiser's philosophy is simple: small files encourage coding simplicity. Rather than use a database or create your own file caching scheme, use the filesystem to handle lots of small pieces of information.
ReiserFS is about eight to fifteen times faster than Ext2 at handling files smaller than 1K.
Even more impressive, (when properly configured) ReiserFS can actually store about 6% more data that Ext2 on the same physical file system. Rather than allocate space in fixed 4K blocks, ReiserFS can allocate the exact space that's needed. A B* tree manages all file system meta-data, and stores and compresses tails, portions of files smaller than a block.
Of course, ReiserFS also has excellent performance for large files, but it's especially adept at managing small files.
For a more in-depth discussion of ReiserFS and instructions on how to install it, see "Journaling File Systems" in the August 2000 issue, available online at http://www.linuxmagazine.com/2000-08...aling_01.html.
JFS for Linux is based on IBM's successful JFS file system for OS/2 Warp. Donated to open source in early 2000 and ported to Linux soon after, JFS is well-suited to enterprise environments. JFS uses many advanced techniques to boost performance, provide for very large file systems, and of course, journal changes to the file system. SGI's XFS (described next) has many similar features. Some of the features of JFS include:
· Extent-based addressing structures. JFS uses extent-based addressing structures, along with aggressive block allocation policies to produce compact, efficient, and scalable structures for mapping logical offsets within files to physical addresses on disk. This feature yields excellent performance.
· Dynamic inode allocation. JFS dynamically allocates space for disk inodes as required, freeing the space when it is no longer required. This is a radical improvement over Ext2, which reserves a fixed amount of space for disk inodes at file system creation time. With dynamic inode allocation, users do not have to estimate the maximum number of files and directories that a file system will contain. Additionally, this feature decouples disk inodes from fixed disk locations.
· Directory organization. Two different directory organizations are provided: one is used for small directories and the other for large directories. The contents of a small directory (up to 8 entries, excluding the self (. or "dot") and parent (.. or "dot dot" entries) are stored within the directory's inode. This eliminates the need for separate directory block I/O and the need to allocate separate storage. The contents of larger directories are organized in a B+ tree keyed on name. B+ trees provide faster directory lookup, insertion, and deletion capabilities when compared to traditional unsorted directory organizations.
· 64-bits. JFS is a full 64-bit file system. All of the appropriate file system structure fields are 64-bits in size. This allows JFS to support large files and partitions.
There are other advanced features in JFS such as allocation groups (which speeds file access times by maximizing locality), and various block sizes ranging from 512-bytes to 4096-bytes (which can be tuned to avoid internal and external fragmentation). You can read about all of them at the JFS Web site at http://www-124.ibm.com/developerworks/oss/jfs.
A little more than a year ago, SGI released a version of its high-end XFS file system for Linux. Based on SGI's Irix XFS file system technology, XFS supports meta-data journaling, and extremely large disk farms. How large? A single XFS file system can be 18,000 petabytes (that's 1015 bytes) and a single file can be 9,000 petabytes. XFS is also capable of delivering excellent I/O performance.
In addition to truly amazing scale and speed, XFS uses many of the same techniques found in JFS.
For the rest of the article, let's look at how to install and use IBM's JFS system. If you have the latest release of Turbolinux, Mandrake, SuSE, Red Hat, or Slackware, you can probably skip ahead to the section "Creating a JFS Partition." If you want to include the latest JFS source code drop into your kernel, the next few sections show you what to do.
THE LATEST AND GREATEST
JFS has been incorporated into the 2.5.6 Linux kernel, and is also included in Alan Cox's 2.4.X-ac kernels beginning with 2.4.18-pre9-ac4, which was released on February 14, 2002. Alan's patches for 2.4.x series are available from http://www.kernel.org.
You can also download a 2.4 kernel source tree and add the JFS patches to this tree. JFS comes as a patch for several of the 2.4.x kernel, so first of all, get the latest kernel from http://www.kernel.org.
At the time of writing, the latest kernel was 2.4.18 and the latest release of JFS was 1.0.20. We'll be using those in the instructions below. The JFS patch is available from the JFS web site. You also need both the utilities (jfsutils-1.0.20.tar.gz), the kernel patch (jfs-2.4.18-patch), and the file system source (jfs-2.4-1.0.20.tar.gz).
If you're using any of the latest distros, you probably won't have to patch the kernel for the JFS code. Instead, you'll only need to compile the kernel to update to the latest release of JFS (you can build JFS either as built-in or as a module). (To determine what version of JFS was shipped in the distribution you're running, you can edit the JFS file super.c and look for a printk() that has the JFS development version number string.)
PATCHING THE KERNEL TO SUPPORT JFS
In the example below, we'll use the 2.4.18 kernel source tree as an example on how to patch JFS into the kernel source tree.
First, you need to download the Linux kernel: linux-2.4.18 .tar.gz. If you have a linux subdirectory, move it to linux-org, so it won't replaced by the linux-2.4.18 source tree. When you download the kernel archive, save it under /usr/src and expand the kernel source tree by using:
% mv linux linux-org
% tar zxvf linux-2.4.18.tar.gz
This operation will create a directory named /usr/src/linux.
The next step is to get the JFS utilities and the appropriate patch for kernel 2.4.18. Before you do that, you need to create a directory for JFS source, /usr/src/jfs1020, and download (to that directory) the JFS kernel patch and the JFS file system source files. Once you have those files, you have everything you need to patch the kernel.
Next, change to the directory of the kernel 2.4.18 source tree and apply the JFS kernel patch:
% cd /usr/src/linux
% patch -p1 < /usr/src/jfs1020/jfs-2.4-18-patch
% cp /usr/src/jfs1020/jfs-2.4-1.0.20.tar.gz .
% tar zxvf jfs-2.4-1.0.20.tar.gz
Now, you need to configure the kernel and enable JFS by going to the File systems section of the configuration menu and enabling JFS file system support (CONFIG_JFS_FS=y). You also have the option to configure JFS as a module, in which case you only need to recompile and reinstall kernel modules by typing:
% make modules && make install_modules
Otherwise, if you configured the JFS option as a kernel built-in, you need to:
1. Recompile the kernel (in /usr/src/linux). Run the command
% make dep && make clean && make bzImage
2. Recompile and install modules (only if you added other options as modules)
% make modules && make modules_install
3. Install the kernel.
# cp arch/i386/boot/bzImage /boot/jfs-bzImage
# cp System.map /boot/jfs-System.map
# ln -s /boot/jfs-System.map /boot/System.map
Next, update /etc/lilo.conf with the new kernel. Add an entry like the one that follows and a jfs1020 entry should appear at the lilo boot prompt:
root=/dev/hda5 # Change to your partition
Be sure to specify the correct root partition. Then run
to make the system aware of the new kernel. Reboot and select the jfs1020 kernel to boot from the new image.
After you compile and install the kernel, you should compile and install the JFS utilities. Save the jfsutils-1.0.20.tar.gz file into the /usr/src/jfs1020 directory, expand it, run configure, and the install the utilities.
% tar zxvf jfsutils-1.0.20.tar.gz
% cd jfsutils-1.0.20
% make && make install
Creating a JFS partition
Having built and installed the JFS utilities, the next step is to create a JFS partition. In this exact example, we'll demonstrate the process using a spare partition.
(If there's unpartitioned space on your disk, you can create a partition using fdisk. After you create the partition, reboot the system to make sure that the new partition is available to create a JFS file system on it. In our test system, we had /dev/hdb3 as a spare partition.)
To create the JFS file system with the log inside the JFS partition, apply the following command:
# mkfs.jfs /dev/hdb3
After the file system has been created, you need to mount it. You will need a mount point. Create a new empty directory such as /jfs to mount the file system with the following command:
# mount -t jfs /dev/hdb3 /jfs
After the file system is mounted, you are ready to try out JFS. To unmount the JFS file system, you simply use the umount command with the same mount point as the argument:
# umount /jfs
A Performance Tweak for All File Systems
Linux records an atime, or access time, whenever a file is read. However, access time isn't very useful, and can be quite costly to track.
To get a quick performance boost on any kind of Linux file system, simply disable access time updates with the mount option noatime. For example, to disable access times on a JFS partition, do something like this in /etc/fstab:
/dev/hda6 /jfs jfs noatime 1 2