13

The Root Filesystem

Before the system can do anything useful, it needs to mount its root filesystem.

The kernel has initialized its subsystems, loaded its built-in drivers, and set up memory management and scheduling. But it still cannot do most of what an operating system needs to do. It cannot run programs from disk. It cannot read configuration files. It cannot load additional driver modules. All of these require a filesystem -- a structured way to organize, name, and access files stored on a device.

The first filesystem the kernel mounts is called the root filesystem. It is mounted at / -- the single forward slash that is the top of the entire directory tree. Every file path on a Linux system starts from this point. Before the root filesystem exists, the system has no /bin, no /etc, no /home. It has nothing.

This article explains how the kernel gets from "no filesystem at all" to a fully mounted root, and the intermediate step that makes it possible.

The Chicken-and-Egg Problem

To mount the root filesystem, the kernel needs a storage driver -- the software that knows how to talk to the disk. But many storage drivers are compiled as loadable modules, stored as files on the root filesystem. You need the driver to read the disk. You need the disk to load the driver.

This circular dependency is one of the classic problems of operating system design. Linux solves it with a temporary filesystem that the bootloader loads into RAM alongside the kernel itself. This temporary filesystem is called the initramfs.

Key term: initramfs (initial RAM filesystem) A small, compressed filesystem archive that the bootloader loads into memory alongside the kernel. It contains the essential drivers, tools, and scripts needed to find and mount the real root filesystem. Once the real root is mounted, the initramfs is discarded.

Think of it like a toolbox you carry with you to a construction site. The site has all the heavy equipment, but the gate is locked. Your toolbox has the key. Once you open the gate and get the heavy equipment running, you do not need the toolbox anymore.

What is Inside the initramfs

The initramfs is a compressed cpio archive -- a simple format that packs files and directories into a single stream. When the kernel starts, it unpacks this archive into a temporary in-memory filesystem (a tmpfs) and uses it as the initial root.

A typical initramfs contains:

  • Kernel modules for the storage controller (NVMe, AHCI, SCSI, USB storage)
  • Filesystem modules (ext4, XFS, Btrfs) so the kernel can understand the root partition's format
  • A small init script or binary that orchestrates the mounting process
  • Essential tools like busybox (a single binary that provides dozens of standard Unix utilities)
  • Device manager components (a minimal udev or mdev) to create /dev entries for detected hardware
  • Cryptographic modules if the root partition is encrypted (LUKS/dm-crypt)

You can examine your system's initramfs:

$ lsinitramfs /boot/initrd.img-6.8.0-generic | head -10
.
bin
bin/busybox
bin/cat
bin/cp
conf
conf/initramfs.conf
etc
etc/modprobe.d
lib

The entire image is typically 20-60 MB compressed, expanding to perhaps 100-200 MB when unpacked. This is tiny compared to a full root filesystem, which may be tens of gigabytes. The initramfs carries only what is needed to get to the real root.

Fig. 13a -- initramfs boot sequence
RAM 1. Bootloader loads: kernel + initramfs 2. Kernel unpacks initramfs cpio archive -> tmpfs at / 3. initramfs filesystem (tmpfs at /) /init script /bin/busybox /lib/modules/ /etc/conf /dev 4. /init loads drivers modprobe nvme, ext4, ... 5. Mount real root mount /dev/sda2 /root 6. switch_root -> real / takes over
The bootloader places both the kernel and the initramfs in RAM. The kernel unpacks the archive into a tmpfs, runs the /init script inside it, loads drivers, finds the real root device, mounts it, then switches root.

The /init Script

When the kernel finishes its own initialization, it looks for a program to execute as PID 1 -- the first userspace process. If an initramfs is present, the kernel runs /init from within it. This is typically a shell script (on Debian/Ubuntu systems using initramfs-tools) or a compiled binary (on systems using dracut).

The init script performs a precise sequence:

  1. Mount the /proc and /sys pseudo-filesystems so it can communicate with the kernel.
  2. Start a minimal device manager (udev or mdev) to create /dev entries for detected hardware.
  3. Load the storage driver module for the root device.
  4. Load the filesystem module for the root partition's format.
  5. If the root is encrypted, prompt for the passphrase and set up the decryption layer.
  6. If the root is on LVM or RAID, assemble the logical volume or array.
  7. Mount the real root filesystem on a temporary mount point (like /root or /mnt).
  8. Call switch_root to replace the initramfs with the real root filesystem.
The initramfs /init script is a carefully ordered sequence. Each step enables the next. Missing a step -- failing to load the right storage driver, for example -- means the real root cannot be mounted, and the boot halts with the dreaded "unable to mount root fs" panic.

switch_root: The Handoff

The switch_root command (or pivot_root, an older mechanism) performs the transition from the initramfs to the real root filesystem. It does three things:

  1. Deletes everything in the initramfs to free the memory it was using.
  2. Moves the mount of the real root filesystem from its temporary mount point to /.
  3. Executes the real init program (typically /sbin/init or systemd) from the new root.

After switch_root, the initramfs is gone. The system's / is now backed by the real disk partition. The real init program takes over as PID 1 and begins starting system services.

Key term: switch_root A command that transitions from the initramfs to the real root filesystem. It frees the initramfs memory, moves the real root mount to /, and executes the real init binary. This is a one-way operation -- there is no going back to the initramfs.

The Old Way: initrd

Before initramfs, Linux used initrd (initial RAM disk). An initrd was a disk image -- a file containing a complete filesystem, formatted as ext2 or similar, loaded into a RAM-based block device. The kernel would mount this block device as the initial root.

initramfs replaced initrd because it is simpler and more flexible. An initramfs is just a cpio archive extracted into tmpfs. It does not need a block device driver, a filesystem driver, or a fixed size allocation. The kernel simply unpacks the archive into memory. Despite this, many systems still name the file initrd.img for historical reasons, even though the contents are actually an initramfs cpio archive.

Filesystem Types

The real root filesystem needs to be in a format the kernel understands. Linux supports dozens of filesystem types. The most common ones for root partitions:

ext4 -- The fourth extended filesystem. The default for many distributions. It is mature, well-tested, and handles most workloads well. It uses a traditional block allocation scheme with inodes, block groups, and journals.

XFS -- Originally developed by SGI for large files and high throughput. Default on Red Hat Enterprise Linux and Fedora. Excels at parallel I/O on multi-core systems.

Btrfs -- A copy-on-write filesystem with built-in support for snapshots, checksums, compression, and multi-device spanning. Default on openSUSE and Fedora Silverblue. More features than ext4 or XFS, but also more complexity.

Each filesystem type is implemented as a kernel module (or built-in code). The initramfs must include the module for whatever filesystem type the real root partition uses. If you format your root as ext4, the initramfs needs ext4.ko. Format it as Btrfs, and you need btrfs.ko.

Fig. 13b -- The Virtual Filesystem Switch (VFS)
Application: open("/etc/hostname") syscall VFS -- Virtual Filesystem Switch Uniform API: open, read, write, close, stat, ... ext4 XFS Btrfs tmpfs/proc Block layer (I/O scheduling, queues) RAM (no disk) Physical storage (SSD, HDD)
The VFS provides a single API for all filesystem types. Applications call open/read/write without knowing whether the file is on ext4, XFS, Btrfs, or an in-memory filesystem like tmpfs. The VFS dispatches each call to the correct filesystem implementation.

The Virtual Filesystem Switch (VFS)

Applications do not call ext4 or XFS code directly. They call generic kernel functions like open(), read(), write(), and close(). The layer that translates these generic calls into filesystem-specific operations is the Virtual Filesystem Switch, or VFS.

The VFS is an abstraction layer. It defines a set of standard operations that every filesystem must implement. When you call open("/etc/hostname"), the VFS figures out which filesystem /etc/hostname lives on, looks up that filesystem's implementation of the open operation, and calls it.

This is what makes it possible to mix filesystem types on a single system. Your root might be ext4 at /, your home directory might be on a Btrfs partition mounted at /home, and /tmp might be a tmpfs in RAM. The VFS stitches them all into one seamless directory tree.

Key term: VFS (Virtual Filesystem Switch) The kernel layer that provides a uniform interface for all filesystem types. It defines standard operations (open, read, write, stat, etc.) and dispatches them to the appropriate filesystem implementation based on which mount point the file belongs to. Applications never interact with a specific filesystem directly.

The VFS maintains several key data structures:

  • Superblock -- represents a mounted filesystem. Contains metadata like block size, total size, and available space.
  • Inode -- represents a file or directory. Contains permissions, ownership, timestamps, and pointers to data blocks.
  • Dentry -- represents a directory entry, mapping a filename to an inode. The kernel caches dentries for fast path lookups.
  • File -- represents an open file, tied to a specific process. Contains the current read/write position and the operations table for that file's filesystem.

What "/" Actually Is

The root filesystem is not just the first filesystem mounted. It is the anchor for everything else. Every other filesystem mount in the system attaches to a directory within the root filesystem's tree.

When you mount a partition at /home, you are grafting that partition's directory tree onto the /home directory of the root filesystem. The original contents of /home (if any) on the root filesystem become hidden -- replaced by the contents of the mounted partition. Unmount the partition, and the original contents reappear.

Fig. 13c -- Mount points graft filesystems together
/ (ext4) /bin /etc /home /tmp Btrfs partition /home/alice /home/bob tmpfs (RAM) /dev/nvme0n1p2 root (ext4) /dev/nvme0n1p3 home (btrfs) no device tmpfs (memory)
Three different filesystems -- ext4 on an NVMe partition, Btrfs on another partition, and tmpfs in RAM -- are grafted together into a single directory tree. Users and applications see one seamless hierarchy.

This is the mount model. The root filesystem at / is the trunk of the tree. Everything else is a branch grafted on. The kernel's mount table (visible in /proc/mounts) records every active mount -- which device, which filesystem type, which mount point, and which options.

$ cat /proc/mounts | head -5
/dev/nvme0n1p2 / ext4 rw,relatime 0 0
/dev/nvme0n1p3 /home btrfs rw,relatime,compress=zstd 0 0
tmpfs /tmp tmpfs rw,nosuid,nodev 0 0
proc /proc proc rw,nosuid,nodev,noexec 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec 0 0
The root filesystem is not just one filesystem among many. It is the anchor point for the entire directory hierarchy. Every other mount attaches to a directory within it. Without a root filesystem, there is no place to attach anything, no way to find programs or configuration, and no operating system.

Pseudo-Filesystems

Not everything mounted in the directory tree represents data on a disk. Linux uses pseudo-filesystems -- filesystems that exist only in memory and provide interfaces to kernel data structures.

proc (mounted at /proc) -- Provides information about running processes and kernel state. /proc/cpuinfo shows CPU details. /proc/meminfo shows memory usage. Each running process has a directory under /proc/<pid>/.

sysfs (mounted at /sys) -- Exposes the kernel's device model as a directory hierarchy. Every device, driver, and bus has a directory with attribute files you can read or write.

tmpfs (often mounted at /tmp and /run) -- A filesystem that lives entirely in RAM. Files written to tmpfs are fast to access but disappear on reboot. The initramfs itself is a tmpfs.

devtmpfs (mounted at /dev) -- A tmpfs where the kernel automatically creates device nodes for detected hardware. udev then applies rules to set permissions and create symlinks.

These pseudo-filesystems are just as "real" to programs as disk-backed filesystems. You open, read, and write them with the same system calls. The VFS makes them indistinguishable from files on disk.

Mounting the Root: Read-Only First

The kernel typically mounts the root filesystem read-only during boot. This is a safety measure. If the system crashed previously, the filesystem might have inconsistencies -- writes that were in progress when power was lost. Mounting read-only prevents further damage while the system runs a consistency check.

The init system (systemd, OpenRC, or similar) later remounts the root filesystem read-write after verifying its integrity, either through a full filesystem check (fsck) or by replaying the filesystem's journal. Journaling filesystems like ext4, XFS, and Btrfs maintain a log of in-progress operations. Replaying this journal recovers the filesystem to a consistent state without scanning every block.

[    3.456789] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Quota mode: none.
[    4.567890] EXT4-fs (nvme0n1p2): re-mounted. Quota mode: none.

The first message shows the read-only mount. The second shows the remount as read-write. Between these two messages, the init system ran its checks and determined the filesystem was clean.

From Root to Running System

With the root filesystem mounted read-write, the system finally has persistent storage. Programs can be loaded from /bin and /usr/bin. Configuration can be read from /etc. Logs can be written to /var/log. The operating system has a place to stand.

The next step is for PID 1 -- the init process that switch_root launched -- to read its configuration and start bringing up system services: networking, logging, login prompts, and everything else that turns a booted kernel into a usable machine.

Next: PID 1: init and systemd