20

Pipes and Redirection

The power of Unix comes from connecting programs together with pipes.

In the previous article, you learned that every process has three standard channels and that programs do not know or care where those channels actually lead. Now we put that design to work. The shell gives you simple syntax for rewiring those channels: send output to a file, read input from a file, or connect the output of one program directly to the input of another.

This is the mechanism that turns a collection of small, single-purpose programs into a powerful system. Each program does one thing. You combine them.

Output Redirection: > and >>

The > operator tells the shell to connect a program's stdout to a file instead of the screen. The shell opens (or creates) the file, then sets file descriptor 1 to point at it before starting the program.

$ echo "first line" > output.txt
$ cat output.txt
first line

If the file already exists, > destroys its contents and starts fresh. This is called truncation. The old data is gone the moment the shell processes the >.

If you want to add to an existing file instead of replacing it, use >>:

$ echo "second line" >> output.txt
$ cat output.txt
first line
second line

The >> operator opens the file in append mode. New data is written after the existing contents. Nothing is deleted.

Key term: Redirection The shell's mechanism for changing where a process's standard channels (stdin, stdout, stderr) connect. The operators >, >>, <, and 2> rewire file descriptors before the program starts. The program itself is unaware that redirection has occurred.
Fig. 20.0 -- Output redirection with > and >>

echo "hello" > file.txt

echo stdout old content replaced with "hello"

TRUNCATED

echo "world" >> file.txt

echo stdout hello world

APPENDED

> creates or overwrites >> creates or appends
The > operator replaces the file's contents (truncation). The >> operator adds to the end (append). Both create the file if it does not exist.
The `>` operator destroys existing file contents. This is the single most common way people accidentally delete data from the command line. If you are not sure whether you want to overwrite, use `>>` to append instead.

Redirecting Standard Error

By default, > redirects file descriptor 1 (stdout). To redirect file descriptor 2 (stderr), prefix the > with the descriptor number:

$ ls /nonexistent 2> errors.txt
$ cat errors.txt
ls: cannot access '/nonexistent': No such file or directory

The 2> syntax sends stderr to the file. Normal output still goes to the screen.

You can redirect both at the same time:

$ ls /home /nonexistent > listing.txt 2> errors.txt

Now stdout goes to listing.txt and stderr goes to errors.txt. Clean separation.

If you want both stdout and stderr to go to the same file, the syntax is:

$ command > all_output.txt 2>&1

The 2>&1 means "redirect file descriptor 2 to wherever file descriptor 1 currently points." Since we already redirected fd 1 to all_output.txt, fd 2 follows it there.

Order matters here. The shell processes redirections left to right. If you reversed it -- 2>&1 > all_output.txt -- stderr would go to the screen (where fd 1 was pointing at the time) and only stdout would go to the file.

Key term: File descriptor duplication The syntax 2>&1 tells the shell to make file descriptor 2 point to the same place as file descriptor 1. This is called "duplicating" a file descriptor. It is how you merge stderr into stdout (or vice versa).

Input Redirection: <

The < operator connects a file to a program's stdin instead of the keyboard:

$ sort < names.txt
Alice
Bob
Charlie

The sort program reads from file descriptor 0 as usual. It does not know that stdin has been rewired from the keyboard to a file. It just reads lines, sorts them, and writes the result to stdout.

Many programs accept a filename as an argument, so you can often achieve the same result with:

$ sort names.txt

The difference is subtle but real. When you use <, the shell opens the file and passes the file descriptor to the program. The program reads from stdin and has no idea a file is involved. When you pass the filename as an argument, the program opens the file itself.

In practice, the result is the same. But the < form matters when you are building pipelines or working with programs that only read from stdin.

The Pipe: |

The pipe operator is the most powerful tool in the Unix shell. It connects the stdout of one program directly to the stdin of the next. No temporary files. No manual copying. The data flows directly from one process to another through a kernel buffer.

$ ls /usr/bin | wc -l

This command lists all files in /usr/bin and sends that list (via a pipe) to wc -l, which counts the number of lines. The result is the number of programs installed in that directory.

Here is what happens mechanically:

  1. The shell creates a pipe -- a pair of connected file descriptors, one for reading and one for writing.
  2. It forks two child processes.
  3. In the first child, stdout (fd 1) is connected to the write end of the pipe, then ls is executed.
  4. In the second child, stdin (fd 0) is connected to the read end of the pipe, then wc is executed.
  5. Both programs run simultaneously. As ls writes data, wc reads it.
Fig. 20.1 -- Anatomy of a pipe

ls /usr/bin | wc -l

ls /usr/bin writes to fd 1 (pipe write end) kernel pipe 4 KB buffer wc -l reads from fd 0 (pipe read end)

ls output: bash cat chmod ...

bytes flow through pipe

wc output: 1847

The shell creates a kernel pipe with a small buffer (typically 4 KB on Linux, though it can grow to 64 KB). The first program writes to it; the second reads from it. Both run concurrently.

The key insight is that both programs run at the same time. The pipe is not "run the first command, save its output, then feed it to the second command." It is a live connection. If the pipe buffer fills up because the writer is faster than the reader, the writer is paused until the reader catches up. This is called backpressure, and it means pipes work efficiently even with large amounts of data.

Building Pipelines

You can chain as many pipes as you want:

$ cat /var/log/syslog | grep "error" | sort | uniq -c | sort -rn | head -10

This pipeline:

  1. Reads the system log
  2. Filters for lines containing "error"
  3. Sorts those lines alphabetically (so duplicates are adjacent)
  4. Counts consecutive duplicate lines
  5. Sorts by count, highest first
  6. Shows only the top 10

Six programs, each doing one small job, connected into a single data-processing chain. No temporary files. No programming language required.

Fig. 20.2 -- A six-stage pipeline
cat syslog | grep error | sort | uniq -c | sort -rn head -10

Data at each stage:

1. All 50,000 log lines

2. 340 lines with "error"

3. 340 lines, sorted

4. 85 unique lines with counts

5. 85 lines, sorted by count

6. Top 10 lines

Each stage reduces the data. The final output is tiny compared to the input.

A six-stage pipeline progressively filters and transforms data. Each program runs simultaneously, connected by kernel pipes. 50,000 lines go in; 10 come out.
The Unix philosophy is "write programs that do one thing well, and connect them with pipes." This is not just a slogan. It is a practical engineering strategy that has worked for over fifty years. Small composable tools are easier to test, debug, and combine than large monolithic programs.

Combining Redirection and Pipes

You can use redirection and pipes together. Redirection affects a single program's file descriptors; pipes connect programs to each other.

$ grep "error" < input.log | sort > sorted_errors.txt

This reads input.log into grep's stdin, pipes grep's stdout to sort, and redirects sort's stdout to a file. Each piece works independently.

You can also discard unwanted output in a pipeline:

$ find / -name "*.conf" 2>/dev/null | head -20

The find command searches the entire filesystem, which generates many "Permission denied" errors on directories you cannot read. The 2>/dev/null throws those errors away. Only the successful results (stdout) flow through the pipe to head.

Here Documents

Sometimes you want to feed multiple lines of text into a program's stdin without creating a file first. A here document (or heredoc) lets you embed the input directly in the command:

$ sort <<EOF
cherry
apple
banana
EOF
apple
banana
cherry

The <<EOF tells the shell: "read everything from the next line until you see a line containing only EOF, and feed it all into the program's stdin." You can use any word in place of EOF -- it is just a delimiter.

Key term: Here document A form of input redirection that allows you to embed multi-line text directly in a shell command. The syntax is <<WORD, where WORD is any delimiter. The shell reads input lines until it encounters a line containing only that delimiter, then feeds all the collected text to the command's stdin.

The Tee Command: Splitting a Stream

What if you want to save a pipeline's intermediate output to a file while also passing it along to the next stage? The tee command reads from stdin, writes a copy to a file, and also writes to stdout:

$ ls /usr/bin | tee listing.txt | wc -l

This counts the files in /usr/bin (via wc -l) and also saves the full listing to listing.txt. The tee command is named after a T-shaped pipe fitting that splits a water flow into two directions.

Fig. 20.3 -- The tee command splits the data stream
ls tee listing.txt wc -l

stdin copy to file stdout

1847
The tee command acts like a T-junction in plumbing. Data flows in from the left. One copy goes down to a file. Another copy continues right to the next program in the pipeline.

Common Patterns

Here are redirection and pipe patterns you will use constantly:

Save command output to a file:

$ dmesg > boot_messages.txt

Append to a log:

$ echo "$(date): backup complete" >> /var/log/backup.log

Discard errors:

$ find / -name "*.conf" 2>/dev/null

Send both stdout and stderr to a file:

$ make > build.log 2>&1

Count lines in a filtered result:

$ grep -r "TODO" src/ | wc -l

Find the ten largest files:

$ du -sh /var/* 2>/dev/null | sort -rh | head -10

Each of these combines the same small set of operators -- >, >>, <, 2>, | -- in different ways. The syntax is minimal. The power comes from composition.

What You Have Learned

Output redirection with > sends stdout to a file (truncating it). Append redirection with >> adds to the end. Input redirection with < feeds a file into stdin. The 2> form redirects stderr specifically, and 2>&1 merges stderr into stdout. The pipe operator | connects one program's stdout to the next program's stdin, allowing you to build multi-stage data processing pipelines. The tee command lets you save a copy of data mid-pipeline.

These operators are the glue that holds the Unix tool ecosystem together. Small programs, each doing one thing, connected into arbitrarily complex workflows.

Next: The Environment