Bash Text Processing - Reading, Filtering, and Transforming Files

Quick Answer: How Do You Process Text in Bash?

The Big Three text processing tools are grep (find matching lines), sed (modify text), and awk (extract fields). For reading files line-by-line, use a while loop with read. Choose the right tool based on your task: grep for filtering, sed for editing, awk for columns.

Quick Comparison: Text Processing Methods

Tool	Best For	Speed	Complexity
grep	Finding matching lines	Very fast	Simple
sed	Modifying/replacing text	Very fast	Moderate
awk	Column extraction, calculations	Fast	Moderate
while read	Line-by-line in Bash	Slower	Simple
cut	Extracting columns	Very fast	Simple

Bottom line: grep for filtering, sed for editing, awk for fields. Use while loops when you need Bash logic.

Text processing is central to system administration and data manipulation. This guide covers reading files, filtering, transforming, and using the essential Unix text tools effectively—skills you’ll use constantly.

Reading Files
Filtering with grep
Stream Editing with sed
Field Processing with awk
Line-by-Line Processing
Text Transformation
Best Practices
Frequently Asked Questions

Reading Files

Before you can process text, you need to read it. Bash offers several ways to read files, each suitable for different scenarios.

Read Entire File

The simplest approach—cat prints the entire file to stdout, which you can then process.

cat filename.txt

Use this when you want to pipe the entire file to another command or when the file fits in memory. For very large files, line-by-line reading is more memory-efficient.

Read with Line Numbers

Display line numbers alongside content using cat -n. This is useful for debugging or referencing specific lines.

cat -n filename.txt

This prepends line numbers to each line, making it easy to identify which line contains what.

Read Specific Lines

Extract portions of files without loading the entire file into memory.

# First 10 lines
head -10 filename.txt

# Last 10 lines
tail -10 filename.txt

# Lines 5-15
sed -n '5,15p' filename.txt

head and tail are fast and efficient for file boundaries. sed with -n (no auto-print) and p (print) gives you exact line ranges.

Read Line by Line (In a Bash Loop)

When you need to process each line with Bash logic, loop through the file line-by-line. This is essential when your processing requires conditional logic or variable tracking.

while IFS= read -r line; do
  echo "Processing: $line"
done < filename.txt

The IFS= (set Internal Field Separator to empty) prevents leading/trailing whitespace from being trimmed. The -r flag treats backslashes as literal characters, not escape sequences. This is the standard Bash idiom for line-by-line processing.

When to Use Line-by-Line Reading

Use while read when:

You need Bash logic for each line (conditions, counters)
Processing small to medium files
You need to aggregate results
Simplicity matters more than speed

Use tools like sed or awk when:

Processing very large files (while loop is slower)
Doing pure text transformation with no Bash logic
You need speed (external tools are faster)

Filtering with grep

grep is the fastest way to find lines matching a pattern. It’s your first tool when you need to filter text.

Basic Search

The simplest grep usage—show all lines containing the pattern.

grep "pattern" filename.txt

grep prints each line containing the pattern. Redirect to another file to save results: grep "pattern" file.txt > results.txt.

Case-Insensitive Search

The -i flag makes grep ignore case differences—perfect for user input where you don’t know if they typed “ERROR” or “error”.

grep -i "pattern" filename.txt

Now “Pattern”, “PATTERN”, and “pattern” all match.

Invert Match (Exclude)

The -v flag shows lines that DON’T match—useful for filtering out noise or unwanted lines.

grep -v "pattern" filename.txt  # Lines NOT matching

This is perfect for excluding debugging output, temporary files, or comments from logs.

Count Matches

Get a count of how many lines match without seeing the lines themselves.

grep -c "pattern" filename.txt

This is faster than piping to wc -l when you just need the count.

Show Line Numbers

The -n flag prepends line numbers, helping you find exact locations.

grep -n "pattern" filename.txt

When you need to edit a file, knowing the line number is essential.

Search in Multiple Files

Recursively search through entire directories with -r.

grep -r "pattern" /path/to/directory

This is how you find all files containing a string across a project.

When to Use grep

Use grep when:

Filtering for matching lines
You need speed (it’s very fast)
Searching directories recursively
Counting matches

Don’t use grep when:

You need to modify text (use sed)
You need field extraction (use awk or cut)

Stream Editing with sed

sed (stream editor) modifies text without opening files in an editor. It’s perfect for find-and-replace operations on files.

Replace Text

The s/old/new/ syntax is sed’s most common pattern—substitute the first occurrence per line.

# Replace first occurrence per line
sed 's/old/new/' file.txt

# Replace all occurrences on each line
sed 's/old/new/g' file.txt

# In-place editing (modify the file directly)
sed -i 's/old/new/g' file.txt

The g flag means “global”—replace all occurrences on each line. The -i flag modifies the file in place. Add .bak after -i to create a backup: sed -i.bak 's/old/new/g' file.txt.

Delete Lines

Remove lines matching a pattern or at specific positions.

# Delete lines matching pattern
sed '/pattern/d' file.txt

# Delete specific line numbers
sed '5d' file.txt             # Delete line 5
sed '5,10d' file.txt          # Delete lines 5-10

The d command deletes the line. This is how you remove logs, comments, or unwanted data from files.

Print Specific Lines

Extract specific lines without modifying the file.

# Print only matching lines (suppress others with -n)
sed -n '/pattern/p' file.txt

# Print lines 10-20
sed -n '10,20p' file.txt

The -n flag tells sed to suppress automatic printing—only print what you explicitly ask for with p.

When to Use sed

Use sed when:

Doing find-and-replace on files
Need to modify files in-place
Using regex patterns
Processing streams efficiently

Don’t use sed when:

Just filtering lines (use grep)
Extracting fields (use awk)
Complex Bash logic required

Field Processing with awk

awk excels at column and field extraction. Unlike grep and sed which work on whole lines, awk automatically splits lines into fields and lets you work with columns.

Extract Columns

awk splits each line into fields (columns) separated by whitespace by default. Access fields with $1, $2, etc.

# Default field separator is whitespace
awk '{print $1}' file.txt       # Print first column
awk '{print $1, $3}' file.txt   # Print columns 1 and 3

This is how you extract data from structured files. Much cleaner than trying to do field extraction with sed or grep.

Custom Field Separator

When your data uses a different delimiter (like CSV or colon-separated), specify it with -F.

# Use colon as separator (useful for /etc/passwd)
awk -F: '{print $1}' /etc/passwd

The -F: tells awk to split on colons instead of whitespace. This is perfect for parsing configuration files and system files.

Filter by Condition

Extract only rows where a condition is true—much more powerful than grep’s pattern matching.

# Print lines where second field > 100
awk '$2 > 100 {print}' file.txt

You can use any comparison operators: >, <, ==, !=, ~ (regex match), etc.

Calculate Sums

Aggregate data across rows—perfect for log analysis and data processing.

# Sum the second column
awk '{sum += $2} END {print sum}' file.txt

The END block runs after all lines are processed, letting you print final results. This is how you calculate totals, averages, or counts from columnar data.

Line-by-Line Processing

Read and Process

while IFS= read -r line; do
  # Skip empty lines
  [ -z "$line" ] && continue

  # Skip comments
  [[ $line =~ ^# ]] && continue

  # Process line
  echo "Processing: $line"
done < input.txt

Read CSV

while IFS=',' read -r id name email; do
  echo "User: $name ($email)"
done < users.csv

Skip Header

head -1 data.csv > output.csv  # Keep header
tail -n +2 data.csv | process >> output.csv  # Skip header

When to Use awk

Use awk when:

Extracting or processing columns/fields
Aggregating data (sums, counts, averages)
Filtering based on field values
Processing structured data (CSV, logs with known format)

Don’t use awk when:

Just filtering whole lines (use grep)
Simple pattern matching (use grep)

Quick Reference

# grep examples
grep "pattern" file.txt         # Find lines containing pattern
grep -i "pattern" file.txt      # Case-insensitive
grep -v "pattern" file.txt      # Exclude pattern
grep -c "pattern" file.txt      # Count matches
grep -r "pattern" /path         # Recursive search

# sed examples
sed 's/old/new/' file.txt       # Replace first occurrence
sed 's/old/new/g' file.txt      # Replace all occurrences
sed -i 's/old/new/g' file.txt   # In-place editing
sed '/pattern/d' file.txt        # Delete matching lines
sed -n '10,20p' file.txt        # Print specific lines

# awk examples
awk '{print $1}' file.txt       # Print first field
awk -F: '{print $1}' file.txt   # Custom separator
awk '$2 > 100 {print}' file.txt # Conditional filtering
awk '{sum += $1} END {print sum}' file.txt  # Calculate sum

# while read loop
while IFS= read -r line; do
  echo "$line"
done < file.txt

Text Transformation

Convert Case

# Uppercase
tr '[:lower:]' '[:upper:]' < file.txt

# Lowercase
tr '[:upper:]' '[:lower:]' < file.txt

Remove Duplicates

sort file.txt | uniq

Count Occurrences

sort file.txt | uniq -c | sort -rn

Sort

# Sort alphabetically
sort file.txt

# Sort numerically
sort -n file.txt

# Reverse sort
sort -r file.txt

Best Practices

1. Pipeline Efficiently

# Good (pipeline processes one item at a time)
cat large_file.txt | grep pattern | awk '{print $1}'

# Better (avoid useless cat)
grep pattern large_file.txt | awk '{print $1}'

2. Quote Patterns

# Good
grep "$search_term" file.txt

# Can fail if $search_term contains special chars
grep $search_term file.txt

3. Use Right Tool

grep - Find lines matching pattern
sed - Stream editing, line-by-line replacement
awk - Field processing and calculations
while read - Complex per-line logic

4. Handle Large Files

# For large files, avoid loading entirely into memory
while IFS= read -r line; do
  # Process one line at a time
done < huge_file.txt

Frequently Asked Questions

Q: Should I use grep or sed for filtering?

A: grep for finding matching lines, sed for complex transformations.

Q: How do I handle special characters in sed?

A: Escape them: sed 's/\$/dollar/' to match literal $.

Q: What’s the performance difference?

A: grep/sed/awk are C programs (fast). Bash loops are slower but more flexible.

Q: Can I combine grep and sed?

A: Yes: grep pattern file.txt | sed 's/old/new/g'

Next Steps

Explore related topics:

Bash Regular Expressions - Pattern matching in depth
Sed Complete Guide - Stream editor reference
Awk Complete Guide - Field processing reference