05

Memory Initialization

RAM does not work until the firmware trains it. Here is what that means.

When you press the power button, the processor wakes up and starts executing firmware from flash memory. But there is a problem: the system's main memory -- the DRAM installed in those slots on the motherboard -- does not work yet. It is not broken. It simply has not been configured.

This is one of the least understood parts of the boot process. The CPU needs memory to run software, but configuring memory requires running software. The firmware has to solve this chicken-and-egg problem before anything else can happen.

Why RAM Does Not Just Work

A stick of RAM is not a simple storage device. It is a dense grid of capacitors and transistors organized into rows and columns across multiple internal banks. Each capacitor holds one bit of data as an electrical charge. Reading or writing any bit requires activating the correct row, waiting for the signals to stabilize, then selecting the correct column.

The timing of every one of these operations matters. How long the row signal must be held before the data is valid. How many clock cycles must elapse between activating one row and activating another. How frequently each row must be refreshed before the capacitors leak their charge. These timings are measured in clock cycles and nanoseconds, and they vary between manufacturers, speeds, and even production batches.

Key term: DRAM (Dynamic Random-Access Memory) The main system memory in virtually all computers. "Dynamic" means each bit is stored as a charge on a tiny capacitor that must be periodically refreshed -- re-read and re-written -- or the data disappears. This is in contrast to SRAM (Static RAM), which holds its state without refreshing but uses more transistors per bit.

If the memory controller sends commands with the wrong timing, the data that comes back is garbage. Worse, incorrect timings can cause electrical stress that damages the memory chips over time. The firmware must discover the correct timings for the specific memory modules installed, then program the memory controller to use them.

SPD: The Memory's Identity Card

Every DDR memory module carries a small chip called an SPD EEPROM. SPD stands for Serial Presence Detect. This chip stores a data sheet about the module: its capacity, the number of ranks, the supported clock speeds, and a table of timing parameters specified by JEDEC -- the standards body that defines how DDR memory works.

Key term: SPD (Serial Presence Detect) A small read-only memory chip on each DRAM module that stores the module's specifications. The firmware reads SPD data over a simple two-wire bus (SMBus or I2C) to learn what memory is installed and what timings it supports.

The firmware reads SPD data at a very early stage, using a simple serial protocol called SMBus (System Management Bus). This bus works even before the main memory controller is configured because it runs on a separate, low-speed interface. Think of it as asking the memory stick to introduce itself before you try to have a conversation.

Fig. 05a -- SPD data read from a DDR module
DDR Memory Module DRAM chips + SPD EEPROM 8 GB DDR4-3200 SPD SMBus (I2C) Firmware (PEI) SPD Data Contents Module type: DDR4 UDIMM Capacity: 8 GB, 1 rank Speed: 3200 MT/s JEDEC Timings tCL = 22 (CAS latency) tRCD = 22 (row-to-column) tRP = 22 (row precharge)
The firmware reads identity and timing data from the SPD chip over a slow serial bus before the main memory interface is usable.

The SPD data includes JEDEC-standard timing parameters. You may have seen these on memory packaging as numbers like "22-22-22-52." Those four numbers represent CAS latency (tCL), row-to-column delay (tRCD), row precharge time (tRP), and row active time (tRAS), all measured in clock cycles. They tell the memory controller the minimum number of cycles it must wait between each type of operation.

Memory Training: The Calibration Process

Reading the SPD data tells the firmware what the memory modules are. But it does not tell the firmware how the signals actually behave on this specific motherboard, with this specific CPU, at the current temperature, with these specific trace lengths on the circuit board.

Electrical signals take time to travel along the copper traces between the CPU and the memory slots. The length of each trace is slightly different. The impedance varies with temperature. At DDR4 and DDR5 data rates -- billions of transfers per second -- even a fraction of a nanosecond of misalignment between the clock signal and the data signals will corrupt data.

This is why the firmware must "train" the memory. Training is a calibration process where the firmware writes known patterns to memory, reads them back, and adjusts the timing parameters until the reads return correct data consistently. It is like tuning a radio dial: you sweep through the range, find the sweet spot where the signal is clearest, then lock it in.

Fig. 05b -- Memory training: finding the timing window

DQS/DQ Eye Diagram (simplified)

TIMING OFFSET (ps) VOLTAGE FAIL FAIL PASS (timing eye) optimal sample point firmware sweeps delay values
The firmware sweeps through possible timing offsets, looking for the window where data reads back correctly. It then sets the sample point in the center of that window for maximum reliability.

Training involves multiple phases. Write leveling adjusts the timing of write commands to each DRAM chip individually. Read leveling does the same for reads. Gate training determines the exact moment to start capturing data from the bus. Each phase runs independently for each byte lane -- each group of eight data pins -- because the trace lengths differ.

The entire process can take hundreds of milliseconds. That does not sound like much, but remember: this happens before the system has usable memory. The firmware runs the training code from the CPU's internal cache, which is repurposed as temporary RAM. This technique is called Cache-as-RAM, or CAR.

Key term: Cache-as-RAM (CAR) A technique where the CPU's L1 or L2 cache is configured as a small block of read/write memory. This gives the firmware a few hundred kilobytes to work with -- enough to run the memory training algorithm -- before DRAM is available. Intel calls this mode "No-Eviction Mode."

The Memory Controller

The component responsible for sending correctly-timed commands to the DRAM is the memory controller. In modern systems, the memory controller is built directly into the CPU die. Older systems had it in a separate chip called the northbridge.

The memory controller does several things. It translates memory addresses from the CPU into the bank, row, and column addresses that the DRAM chips understand. It schedules commands to maximize throughput -- reordering reads and writes to avoid unnecessary row activations. It manages the refresh cycle, periodically re-reading and re-writing every row so the capacitors do not lose their charge.

Fig. 05c -- Memory controller and DRAM organization
CPU Memory Controller

CMD/ADDR DATA (DQ) CLOCK

DDR channel

DRAM Module (1 DIMM) Bank 0 rows x cols Bank 1 rows x cols Bank 2 rows x cols ... Access Sequence 1. Activate row (tRCD wait) 2. Read column (tCL wait) 3. Precharge (tRP wait) Refresh: every row re-read every 64 ms Miss a refresh cycle and data silently corrupts
The memory controller on the CPU translates addresses into bank/row/column commands. Each access follows a strict sequence of timed operations. Refresh runs continuously in the background.

After training completes, the firmware programs all the calibrated timing values into the memory controller's configuration registers. From this point forward, the CPU can read and write main memory normally. The training results are often saved to flash storage so that the next boot can reuse them, making subsequent boots faster. This is why changing or rearranging your RAM sticks sometimes triggers a longer boot -- the firmware detects the change and retrains from scratch.

DDR Generations

The DDR standard has gone through several generations. Each generation roughly doubles the data transfer rate and reduces power consumption while increasing complexity.

DDR3 ran at 800 to 1066 MHz base clock with a peak transfer rate around 17 GB/s per channel. DDR4 pushed that to 1200-1600 MHz base, adding bank groups and finer timing granularity. DDR5, the current generation, doubled the channel width by splitting each module into two independent 32-bit channels, added on-die ECC (error correction within each chip), and moved the voltage regulator from the motherboard onto the module itself.

Each generation requires a different training algorithm. The firmware for a DDR5 system is substantially more complex than for DDR4 because DDR5 has more timing parameters, more per-bit adjustments, and the two-channel-per-DIMM architecture that requires independent calibration of each sub-channel.

RAM does not work at power-on. The firmware must read each module's SPD data, run a training algorithm that sweeps through timing parameters to find the reliable operating window, and program the memory controller with the results. Only then does the system have usable main memory. Everything before this point runs from flash or CPU cache.

Dual Channel and Interleaving

Most consumer systems support two memory channels -- two independent paths between the CPU and the DRAM. When you install matching modules in both channels, the memory controller can interleave accesses: it sends even-addressed cache lines to one channel and odd-addressed cache lines to the other, effectively doubling the available bandwidth.

This is why memory kits are sold in pairs. Two 8 GB modules in dual channel will outperform a single 16 GB module in most workloads, even though the total capacity is the same. The firmware detects the channel configuration during training and enables interleaving automatically when the modules match.

Server systems take this further. A dual-socket server might have eight or twelve memory channels per processor, with multiple DIMMs per channel. The firmware must train every module on every channel, which is why enterprise servers often take noticeably longer to boot than desktops.

ECC: When Bits Flip

Cosmic rays, alpha particles from chip packaging, and electrical noise can flip bits in DRAM. In a desktop, a flipped bit might crash an application or corrupt a file. In a server, it could corrupt a database or crash a hypervisor managing hundreds of virtual machines.

ECC memory -- Error-Correcting Code memory -- adds extra data bits to each word. For every 64 bits of data, ECC adds 8 bits of parity information. This allows the memory controller to detect and correct any single-bit error, and detect (but not correct) any two-bit error.

ECC requires support in the memory controller, the DRAM modules, and the firmware. The training process for ECC memory is slightly longer because the error-correction logic must also be calibrated. Consumer platforms often do not support ECC, while server and workstation platforms require it.

Memory initialization is one of the most time-consuming parts of the boot process and one of the most critical. Without trained, working DRAM, the system has nothing but a few hundred kilobytes of cache to work with. Everything that follows -- loading the bootloader, decompressing the kernel, building the page tables -- depends on the memory controller being correctly configured during this early phase.

What Happens Next

Once memory training completes, the firmware copies itself from the cramped cache-as-RAM environment into proper DRAM and continues execution with megabytes of memory available. The next task is discovering all the other hardware in the system: storage controllers, network adapters, USB hubs, and graphics cards. That discovery happens through the bus system.

Next: The Bus System