15

The Process Model

Every running program is a process. The kernel manages them all.

Every program running on your computer right now -- your browser, your text editor, the background service checking for software updates -- is a process. A process is the kernel's abstraction for "a running program." It is the unit of work that the operating system tracks, schedules, and controls.

The previous article described PID 1, the first process. But how does PID 1 start everything else? How does any process start another? The answer lies in three system calls that have defined Unix process management since the 1970s: fork(), exec(), and wait().

What a Process Is

Think of a process as a sealed envelope. Inside the envelope is everything needed to run a program: the program's code, its current state, the data it is working on, and a record of which files it has open. The kernel keeps a table of all these envelopes and decides which one gets to use the CPU at any given moment.

Each process has:

  • A PID -- a unique integer identifier
  • A parent PID (PPID) -- the PID of the process that created it
  • A memory space -- its own private region of RAM
  • An execution state -- running, sleeping, stopped, or zombie
  • Open file descriptors -- references to files, sockets, and pipes
  • Credentials -- the user and group it runs as
Key term: Process An instance of a running program. The kernel assigns each process its own memory space, a unique PID, and a set of resources. Multiple processes can run the same program simultaneously -- each is an independent instance.
Fig. 15a -- Anatomy of a process
Process (PID 4827) Program Code (text segment) Data (heap + stack) PID: 4827 PPID: 1203 State: Running UID: 1000 GID: 1000 Nice: 0 Open File Descriptors fd 0: stdin (terminal) fd 1: stdout (terminal) fd 2: stderr (terminal) fd 3: /var/log/app.log CPU Registers (saved on context switch) Signal Handlers (SIGTERM, SIGINT, ...)
Each process is a self-contained unit with its own code, data, state, file descriptors, and saved CPU registers. The kernel manages all of these.

The Process Table

The kernel maintains a data structure called the process table (internally, a list of task_struct structures in Linux). Every process on the system has an entry. You can see a snapshot of it with the ps command:

ps aux

This shows every process, its PID, its parent, how much CPU and memory it uses, and the command that started it. On a typical desktop system, you will see hundreds of entries. On a busy server, thousands.

Each entry costs memory. The kernel allocates a task_struct (about 6 KB on a 64-bit Linux system) plus the process's page tables and kernel stack. This is why creating processes is not free -- though on modern systems, the overhead is small enough that you rarely worry about it.

fork(): Cloning a Process

The fundamental operation in Unix process creation is fork(). When a process calls fork(), the kernel creates a near-exact copy of the calling process. The new process -- called the child -- gets a new PID, but everything else is a duplicate: the same code, the same data, the same open files, the same position in the program.

Think of it like photocopying a document. You now have two copies that are identical at the moment of copying, but changes to one do not affect the other.

After fork() returns, two processes are running the same code at the same point. The only difference is the return value: the parent receives the child's PID, and the child receives zero. This is how each process knows which one it is.

pid_t pid = fork();

if (pid == 0) {
    // This is the child process
    printf("I am the child, PID %d\n", getpid());
} else {
    // This is the parent process
    printf("I am the parent, child PID is %d\n", pid);
}
Key term: fork() A system call that creates a new process by duplicating the calling process. The new child process has its own PID and memory space, but starts as an exact copy of its parent. Modern Linux uses copy-on-write to make this efficient.

In practice, the kernel does not actually copy all the memory immediately. It uses a technique called copy-on-write (COW): both parent and child share the same physical memory pages, marked read-only. Only when one of them tries to write does the kernel copy just the affected page. This makes fork() fast even for processes using gigabytes of memory.

Fig. 15b -- fork() creates a child process

Before fork() After fork()

PID 500 code + data memory: 40 MB 3 open files Parent PID 500 fork() returned 501 code + data (shared) Child PID 501 fork() returned 0 code + data (shared) Shared pages (COW)

fork()

fork() splits one process into two. The parent and child share physical memory through copy-on-write until one of them modifies a page.

exec(): Replacing a Process

fork() alone just creates copies. To run a different program, a process calls one of the exec() family of functions (execve, execl, execvp, etc.). This replaces the process's code, data, and stack with a new program loaded from disk. The PID stays the same. The open file descriptors stay the same (unless marked close-on-exec). But the running program is entirely different.

This is the standard Unix pattern for starting a new program:

  1. The parent calls fork() to create a child.
  2. The child calls exec() to replace itself with the new program.
  3. The parent continues running.

This two-step approach seems roundabout, but it is powerful. Between the fork() and the exec(), the child can rearrange its file descriptors, change its working directory, drop privileges, or set up any other environment the new program needs -- all without affecting the parent.

The fork-then-exec pattern is the foundation of process creation in Unix. fork() creates a copy, exec() replaces it with a new program. This two-step design lets the child set up its environment before the new program starts.

wait(): Collecting the Dead

When a child process exits, it does not vanish immediately. The kernel retains its exit status -- a small integer indicating whether it succeeded (0) or failed (nonzero) -- in the process table. The parent must call wait() or waitpid() to read this status and release the entry.

This is the contract: the parent that created the child is responsible for collecting its exit status. Until it does, the dead child remains in the process table as a zombie.

pid_t pid = fork();

if (pid == 0) {
    // Child does its work, then exits
    exit(0);
} else {
    // Parent waits for the child to finish
    int status;
    waitpid(pid, &status, 0);
    printf("Child exited with status %d\n", WEXITSTATUS(status));
}

Process States

A process is always in one of several states. The kernel tracks the current state and transitions the process between them:

Fig. 15c -- Process state transitions
Created Ready Running Sleeping Stopped Zombie Removed fork() scheduled preempted I/O wait event SIGSTOP SIGCONT exit() wait()
A process moves through these states during its lifetime. The scheduler decides which ready process gets the CPU. A zombie exists only to hold an exit status until the parent reads it.

Running -- currently executing on a CPU core. On a single-core system, only one process can be running at a time. On a multi-core system, one per core.

Ready (Runnable) -- able to run, waiting for the scheduler to give it a CPU time slice.

Sleeping -- waiting for something external. This is the most common state. A process reading from a network socket sleeps until data arrives. A process waiting for disk I/O sleeps until the read completes. There are two variants: interruptible sleep (can be woken by signals) and uninterruptible sleep (cannot).

Stopped -- paused by a signal, typically SIGSTOP or SIGTSTP (what happens when you press Ctrl+Z in a terminal). The process remains in memory but does not execute. SIGCONT resumes it.

Zombie -- the process has exited, but its parent has not yet called wait(). The process table entry remains so the parent can read the exit status.

Zombies and Orphans

These two situations arise from the parent-child relationship.

Zombie Processes

A zombie is a process that has finished executing but still has an entry in the process table. The entry is tiny -- just enough to store the PID and exit status -- but it is still a slot in the kernel's table. A few zombies are harmless. Thousands of them can exhaust the PID space.

You can spot zombies in ps output by the state letter Z:

ps aux | grep Z

The fix is always the same: the parent must call wait(). If the parent is your code, fix it. If the parent is a third-party program, the only remedy is to kill the parent. When the parent dies, the zombies are adopted by PID 1, which reaps them immediately.

Orphan Processes

An orphan is a process whose parent has exited. The kernel does not allow a process to have no parent, so it reassigns the orphan to PID 1 (the init process). PID 1 is required to periodically call wait() for any adopted children, which prevents orphans from becoming permanent zombies.

This is one of PID 1's special responsibilities, as we discussed in the previous article. If PID 1 fails to reap orphaned zombies, the system slowly accumulates dead process entries that can never be cleaned up.

Key term: Zombie process A process that has exited but whose parent has not yet collected its exit status via wait(). It occupies a slot in the process table but consumes no CPU or memory beyond that small entry. The name comes from the fact that it is dead but still present.

The Process Tree

Because every process (except PID 1) has a parent, the entire set of processes forms a tree. PID 1 is the root. You can see this tree with:

pstree -p

On a typical system, the tree looks something like this: PID 1 (systemd) starts a login manager, which starts your desktop session, which starts a terminal emulator, which starts your shell, which starts the commands you type. Each layer is a parent-child relationship created by fork() and exec().

Fig. 15d -- The process tree
systemd (1) sshd (480) gdm (512) cron (495) nginx (520) sshd (3201) gnome-shell (830) bash (3202) gnome-terminal (1400) vim (3250) bash (1401) worker (521) worker (522)
Every process has a parent, forming a tree rooted at PID 1. Your shell, your editor, and every command you run are branches on this tree.

Signals

Processes communicate with each other and with the kernel through signals -- small, numbered notifications. When a process receives a signal, it can handle it (run a custom function), ignore it, or accept the default behavior (which is usually to terminate).

The most commonly encountered signals:

SignalNumberDefaultTypical Source
SIGTERM15Terminatekill command
SIGKILL9Terminate (cannot be caught)kill -9
SIGINT2TerminateCtrl+C
SIGTSTP20StopCtrl+Z
SIGCONT18Continuefg or bg
SIGHUP1TerminateTerminal closed
SIGCHLD17IgnoreChild process exited
SIGSEGV11Terminate + core dumpInvalid memory access

Two signals cannot be caught or ignored: SIGKILL (9) and SIGSTOP (19). These are the kernel's guarantee that any process (except PID 1) can always be stopped or killed.

Signals are the Unix mechanism for process-to-process and kernel-to-process notification. SIGKILL and SIGSTOP cannot be caught or ignored. Every other signal can be handled by the receiving process.

Seeing It in Action

You can observe the process model with a few commands:

# Show the process tree
pstree -p

# Show all processes with details
ps aux

# Watch processes in real time
top

# Trace the system calls a process makes (including fork and exec)
strace -f -e trace=process bash -c "ls"

The strace example is particularly illuminating. You will see your shell call clone() (Linux's version of fork()), then the child call execve("/usr/bin/ls", ...). The parent calls wait4() to collect the result. This is the fork-exec-wait pattern playing out in real time, exactly as described.

Why This Matters

Every time you type a command at a shell prompt, the shell forks a child, the child execs the command, and the shell waits for it to finish. Every time a web server handles a request (in the traditional model), it forks a worker. Every time you start a program from a GUI menu, a process calls fork() and exec().

The process model is the foundation on which everything else in the operating system rests. Pipes, redirections, job control, services, containers -- they all build on these three system calls. Understanding fork(), exec(), and wait() is understanding how Unix works.

Next: The TTY and Terminal