11

Kernel Init

The kernel takes control of the hardware and begins building the operating system.

The bootloader has done its job. It found the kernel image on disk, loaded it into memory, decompressed it, and jumped to its entry point. The processor is now executing kernel code. But the kernel is not yet an operating system. It is a large program that has just started running on bare hardware with almost nothing set up.

What happens next is a precisely ordered sequence of initialization steps. Each step builds on the one before it. Skip one, get the order wrong, and the machine hangs or panics. This article follows the kernel from its first instruction to the moment it launches the first userspace process.

The Entry Point

On x86-64 Linux, the kernel's journey begins in architecture-specific assembly code. This code was written by hand, not generated by a compiler, because the machine is not yet in a state where compiled C code can run safely. There is no stack. The memory management unit may not be configured. Interrupts are disabled.

The assembly stub does the minimum work needed to get into C territory: it sets up a temporary stack, configures the page tables for the kernel's own memory, enables the memory management unit, and then calls the C function start_kernel().

Key term: start_kernel() The C function where Linux kernel initialization truly begins. Located in init/main.c in the kernel source tree, it is the single entry point for all of the kernel's high-level setup. Everything before it is architecture-specific assembly. Everything after it is portable C.

Think of it like building a house. The assembly stub pours the foundation and frames the first wall. start_kernel() is the general contractor who arrives and begins coordinating every trade -- electricians, plumbers, roofers -- in the right order.

What start_kernel() Does

The function is roughly 100 lines of sequential function calls. Each call initializes one kernel subsystem. The order matters because later subsystems depend on earlier ones. Here is a simplified view of the sequence:

Fig. 11a -- start_kernel() initialization sequence
TIME lock_kernel() -- take the Big Kernel Lock setup_arch() -- CPU, memory map, platform trap_init() -- exception handlers mm_init() -- memory management sched_init() -- process scheduler init_IRQ() -- interrupt controllers time_init() -- system clock console_init() -- early printk output rest_init() -- spawn PID 1, idle loop
A simplified view of the major calls inside start_kernel(). Each step depends on the ones above it. The kernel must complete them in order.

Let us walk through the most important ones.

Setting Up the Architecture

setup_arch() is where the kernel learns what kind of machine it is running on. On x86, this function reads the memory map that the BIOS or UEFI firmware left behind, identifies the CPU model and its capabilities, and configures architecture-specific features.

The memory map is critical. It tells the kernel which regions of physical memory are usable RAM, which are reserved by firmware, and which are mapped to hardware devices. Without this map, the kernel cannot allocate memory safely -- it might overwrite firmware data or try to use an address that corresponds to a device register instead of RAM.

setup_arch() also detects how many CPU cores are present and prepares data structures for each one. On a multi-core machine, only one core -- the bootstrap processor -- runs start_kernel(). The other cores sit idle until the kernel explicitly wakes them later.

Key term: Bootstrap Processor (BSP) The single CPU core that the hardware selects to run the boot sequence. All other cores are Application Processors (APs). The BSP performs all kernel initialization and then wakes the APs one by one.

Memory Management

mm_init() builds the kernel's memory management system. Before this call, the kernel uses a simple boot-time allocator that hands out memory in chunks and never reclaims it. That works for early setup, but it is far too wasteful for a running system.

The full memory manager divides physical RAM into fixed-size pages, typically 4,096 bytes each. It tracks which pages are free and which are in use. It sets up the page tables that let the CPU translate virtual addresses -- the addresses that programs see -- into physical addresses where data actually lives in the RAM chips.

Fig. 11b -- Virtual to physical address translation

VIRTUAL Process A: 0x4000

Process B: 0x4000 Kernel: 0xFFFF8000 PAGE TABLES

PHYSICAL RAM 0x00000 firmware

0x12000 page A 0x1F000 free 0x34000 page B 0x80000 kernel

Both processes use address 0x4000, but page tables map them to different physical locations. Each process sees its own private address space.

Virtual memory gives each process the illusion of having its own private address space. The page tables translate virtual addresses to physical RAM locations. Two processes can use the same virtual address without conflicting.

This translation mechanism is fundamental to everything the operating system does. It lets multiple programs run at the same time without stepping on each other's memory. It lets the kernel protect its own data from buggy or malicious programs. It enables features like swap, memory-mapped files, and copy-on-write.

The memory management subsystem is one of the first things the kernel sets up because almost everything else depends on it. Without the ability to allocate and track pages, the kernel cannot create processes, load drivers, or mount filesystems.

The Scheduler

sched_init() creates the process scheduler. A scheduler is the piece of the kernel that decides which program runs on which CPU core and for how long. At this point, no user programs exist, but the scheduler infrastructure must be in place before the kernel can create any processes -- including the very first one.

The scheduler maintains a run queue for each CPU core. A run queue is a list of processes that are ready to execute. When a core finishes its current time slice or a process blocks waiting for I/O, the scheduler picks the next process from the queue.

Linux uses the Completely Fair Scheduler (CFS) by default. The name reflects its design goal: give every process a fair share of CPU time proportional to its priority. CFS tracks how much CPU time each process has consumed using a virtual runtime counter. The process with the lowest virtual runtime runs next.

Key term: Scheduler The kernel subsystem that decides which process runs on each CPU core. It multiplexes many processes onto a limited number of cores, switching between them fast enough to create the illusion of simultaneous execution.

Interrupts

init_IRQ() sets up the interrupt system. An interrupt is a signal from hardware -- or from software -- that tells the CPU to stop what it is doing and handle an event. When you press a key, the keyboard controller sends an interrupt. When a network packet arrives, the network card sends one. When a timer fires, the timer hardware sends one.

Each interrupt has a number. The kernel installs a handler function for each number -- a piece of code that knows how to respond to that specific event. When interrupt number 42 fires, the CPU looks up handler 42 and runs it.

Before init_IRQ(), the interrupt controllers are not configured and any hardware interrupt would either be ignored or crash the machine. After this call, the kernel can respond to hardware events.

The Console

console_init() sets up early text output. Every line of boot text you see scrolling by -- the [ 0.000000] timestamped messages -- comes through this subsystem. The kernel needs a way to tell you what it is doing (and what went wrong) before the full display driver is loaded.

The early console typically talks directly to the VGA text buffer at a fixed memory address, or to a serial port. It is crude but reliable. Later in the boot process, a full framebuffer console or graphical display takes over.

rest_init() and the Birth of PID 1

After all subsystems are initialized, start_kernel() calls rest_init(). This function does three critical things:

  1. It creates a kernel thread called kthreadd (PID 2), which will be the parent of all future kernel threads.
  2. It creates the init process (PID 1), which will become the first userspace process.
  3. It turns the bootstrap code path into the idle thread (PID 0) -- the thread that runs when there is nothing else to do.
Fig. 11c -- The first three processes
PID 0 idle (swapper) PID 1 init (userspace) PID 2 kthreadd (kernel) services login shells kworker softirq migration

PID 0 creates both PID 1 and PID 2. All userspace from 1, kernel threads from 2.

The kernel's first three processes form the root of the entire process tree. PID 1 (init) is the ancestor of every userspace program. PID 2 (kthreadd) parents all kernel worker threads.

PID 1 is special. If it dies, the kernel panics. It is the root of the entire userspace process tree. Every daemon, every shell, every application is ultimately a descendant of PID 1. We will explore PID 1 in detail in a later article.

The Kernel Command Line

Throughout initialization, the kernel reads parameters from its command line -- a string passed by the bootloader. You can see your system's kernel command line by reading /proc/cmdline. Typical entries include:

  • root=/dev/sda2 -- which partition holds the root filesystem
  • ro -- mount root as read-only initially
  • quiet -- suppress most boot messages
  • init=/sbin/init -- which program to run as PID 1

These parameters influence how nearly every subsystem initializes. The root= parameter, for example, determines which block device the kernel will try to mount as the root filesystem. The init= parameter lets you override the default PID 1 binary, which is useful for rescue situations.

The kernel command line is the bridge between the bootloader and the kernel. It carries configuration that the kernel cannot determine on its own, such as which disk partition holds the root filesystem or how verbose the boot messages should be.

The Kernel Log

As each subsystem initializes, it writes messages to the kernel log buffer -- a ring buffer in memory. These are the dmesg messages you can read after the system is running. Each message is timestamped relative to the start of boot.

The early messages tell you exactly what the kernel found and configured:

[    0.000000] Linux version 6.8.0 ...
[    0.000000] Command line: root=/dev/sda2 ro quiet
[    0.004523] Memory: 16384MB available
[    0.012847] CPU: 8 cores detected
[    0.089234] Scheduler: CFS initialized
[    0.102456] Interrupt controller: APIC configured

Reading dmesg is one of the most useful debugging tools available. If a piece of hardware does not work, the answer is almost always in these messages.

From Kernel to Userspace

Once rest_init() spawns PID 1, the kernel's direct initialization work is largely done. PID 1 -- whether it is the traditional init, systemd, or another init system -- takes over the job of starting services, mounting filesystems, and bringing the system to a usable state.

But the kernel does not go away. It continues running underneath everything, handling interrupts, managing memory, scheduling processes, and mediating every interaction between software and hardware. The kernel is not a program that runs and exits. It is the permanent foundation on which everything else stands.

The transition from kernel initialization to userspace is the moment the computer stops being a machine running setup code and becomes an operating system. The next articles explore the subsystems that make this possible -- starting with the clock signal that times every operation.

Next: The Clock Signal