Bash Text Processing - Reading, Filtering, and Transforming Files
Quick Answer: How Do You Process Text in Bash?
The Big Three text processing tools are grep (find matching lines), sed (modify text), and awk (extract fields). For reading files line-by-line, use a while loop with read. Choose the right tool based on your task: grep for filtering, sed for editing, awk for columns.
Quick Comparison: Text Processing Methods
| Tool | Best For | Speed | Complexity |
|---|---|---|---|
| grep | Finding matching lines | Very fast | Simple |
| sed | Modifying/replacing text | Very fast | Moderate |
| awk | Column extraction, calculations | Fast | Moderate |
| while read | Line-by-line in Bash | Slower | Simple |
| cut | Extracting columns | Very fast | Simple |
Bottom line: grep for filtering, sed for editing, awk for fields. Use while loops when you need Bash logic.
Text processing is central to system administration and data manipulation. This guide covers reading files, filtering, transforming, and using the essential Unix text tools effectively—skills you’ll use constantly.
Table of Contents
- Reading Files
- Filtering with grep
- Stream Editing with sed
- Field Processing with awk
- Line-by-Line Processing
- Text Transformation
- Best Practices
- Frequently Asked Questions
Reading Files
Before you can process text, you need to read it. Bash offers several ways to read files, each suitable for different scenarios.
Read Entire File
The simplest approach—cat prints the entire file to stdout, which you can then process.
cat filename.txt
Use this when you want to pipe the entire file to another command or when the file fits in memory. For very large files, line-by-line reading is more memory-efficient.
Read with Line Numbers
Display line numbers alongside content using cat -n. This is useful for debugging or referencing specific lines.
cat -n filename.txt
This prepends line numbers to each line, making it easy to identify which line contains what.
Read Specific Lines
Extract portions of files without loading the entire file into memory.
# First 10 lines
head -10 filename.txt
# Last 10 lines
tail -10 filename.txt
# Lines 5-15
sed -n '5,15p' filename.txt
head and tail are fast and efficient for file boundaries. sed with -n (no auto-print) and p (print) gives you exact line ranges.
Read Line by Line (In a Bash Loop)
When you need to process each line with Bash logic, loop through the file line-by-line. This is essential when your processing requires conditional logic or variable tracking.
while IFS= read -r line; do
echo "Processing: $line"
done < filename.txt
The IFS= (set Internal Field Separator to empty) prevents leading/trailing whitespace from being trimmed. The -r flag treats backslashes as literal characters, not escape sequences. This is the standard Bash idiom for line-by-line processing.
When to Use Line-by-Line Reading
Use while read when:
- You need Bash logic for each line (conditions, counters)
- Processing small to medium files
- You need to aggregate results
- Simplicity matters more than speed
Use tools like sed or awk when:
- Processing very large files (while loop is slower)
- Doing pure text transformation with no Bash logic
- You need speed (external tools are faster)
Filtering with grep
grep is the fastest way to find lines matching a pattern. It’s your first tool when you need to filter text.
Basic Search
The simplest grep usage—show all lines containing the pattern.
grep "pattern" filename.txt
grep prints each line containing the pattern. Redirect to another file to save results: grep "pattern" file.txt > results.txt.
Case-Insensitive Search
The -i flag makes grep ignore case differences—perfect for user input where you don’t know if they typed “ERROR” or “error”.
grep -i "pattern" filename.txt
Now “Pattern”, “PATTERN”, and “pattern” all match.
Invert Match (Exclude)
The -v flag shows lines that DON’T match—useful for filtering out noise or unwanted lines.
grep -v "pattern" filename.txt # Lines NOT matching
This is perfect for excluding debugging output, temporary files, or comments from logs.
Count Matches
Get a count of how many lines match without seeing the lines themselves.
grep -c "pattern" filename.txt
This is faster than piping to wc -l when you just need the count.
Show Line Numbers
The -n flag prepends line numbers, helping you find exact locations.
grep -n "pattern" filename.txt
When you need to edit a file, knowing the line number is essential.
Search in Multiple Files
Recursively search through entire directories with -r.
grep -r "pattern" /path/to/directory
This is how you find all files containing a string across a project.
When to Use grep
Use grep when:
- Filtering for matching lines
- You need speed (it’s very fast)
- Searching directories recursively
- Counting matches
Don’t use grep when:
- You need to modify text (use sed)
- You need field extraction (use awk or cut)
Stream Editing with sed
sed (stream editor) modifies text without opening files in an editor. It’s perfect for find-and-replace operations on files.
Replace Text
The s/old/new/ syntax is sed’s most common pattern—substitute the first occurrence per line.
# Replace first occurrence per line
sed 's/old/new/' file.txt
# Replace all occurrences on each line
sed 's/old/new/g' file.txt
# In-place editing (modify the file directly)
sed -i 's/old/new/g' file.txt
The g flag means “global”—replace all occurrences on each line. The -i flag modifies the file in place. Add .bak after -i to create a backup: sed -i.bak 's/old/new/g' file.txt.
Delete Lines
Remove lines matching a pattern or at specific positions.
# Delete lines matching pattern
sed '/pattern/d' file.txt
# Delete specific line numbers
sed '5d' file.txt # Delete line 5
sed '5,10d' file.txt # Delete lines 5-10
The d command deletes the line. This is how you remove logs, comments, or unwanted data from files.
Print Specific Lines
Extract specific lines without modifying the file.
# Print only matching lines (suppress others with -n)
sed -n '/pattern/p' file.txt
# Print lines 10-20
sed -n '10,20p' file.txt
The -n flag tells sed to suppress automatic printing—only print what you explicitly ask for with p.
When to Use sed
Use sed when:
- Doing find-and-replace on files
- Need to modify files in-place
- Using regex patterns
- Processing streams efficiently
Don’t use sed when:
- Just filtering lines (use grep)
- Extracting fields (use awk)
- Complex Bash logic required
Field Processing with awk
awk excels at column and field extraction. Unlike grep and sed which work on whole lines, awk automatically splits lines into fields and lets you work with columns.
Extract Columns
awk splits each line into fields (columns) separated by whitespace by default. Access fields with $1, $2, etc.
# Default field separator is whitespace
awk '{print $1}' file.txt # Print first column
awk '{print $1, $3}' file.txt # Print columns 1 and 3
This is how you extract data from structured files. Much cleaner than trying to do field extraction with sed or grep.
Custom Field Separator
When your data uses a different delimiter (like CSV or colon-separated), specify it with -F.
# Use colon as separator (useful for /etc/passwd)
awk -F: '{print $1}' /etc/passwd
The -F: tells awk to split on colons instead of whitespace. This is perfect for parsing configuration files and system files.
Filter by Condition
Extract only rows where a condition is true—much more powerful than grep’s pattern matching.
# Print lines where second field > 100
awk '$2 > 100 {print}' file.txt
You can use any comparison operators: >, <, ==, !=, ~ (regex match), etc.
Calculate Sums
Aggregate data across rows—perfect for log analysis and data processing.
# Sum the second column
awk '{sum += $2} END {print sum}' file.txt
The END block runs after all lines are processed, letting you print final results. This is how you calculate totals, averages, or counts from columnar data.
Line-by-Line Processing
Read and Process
while IFS= read -r line; do
# Skip empty lines
[ -z "$line" ] && continue
# Skip comments
[[ $line =~ ^# ]] && continue
# Process line
echo "Processing: $line"
done < input.txt
Read CSV
while IFS=',' read -r id name email; do
echo "User: $name ($email)"
done < users.csv
Skip Header
head -1 data.csv > output.csv # Keep header
tail -n +2 data.csv | process >> output.csv # Skip header
When to Use awk
Use awk when:
- Extracting or processing columns/fields
- Aggregating data (sums, counts, averages)
- Filtering based on field values
- Processing structured data (CSV, logs with known format)
Don’t use awk when:
- Just filtering whole lines (use grep)
- Simple pattern matching (use grep)
Quick Reference
# grep examples
grep "pattern" file.txt # Find lines containing pattern
grep -i "pattern" file.txt # Case-insensitive
grep -v "pattern" file.txt # Exclude pattern
grep -c "pattern" file.txt # Count matches
grep -r "pattern" /path # Recursive search
# sed examples
sed 's/old/new/' file.txt # Replace first occurrence
sed 's/old/new/g' file.txt # Replace all occurrences
sed -i 's/old/new/g' file.txt # In-place editing
sed '/pattern/d' file.txt # Delete matching lines
sed -n '10,20p' file.txt # Print specific lines
# awk examples
awk '{print $1}' file.txt # Print first field
awk -F: '{print $1}' file.txt # Custom separator
awk '$2 > 100 {print}' file.txt # Conditional filtering
awk '{sum += $1} END {print sum}' file.txt # Calculate sum
# while read loop
while IFS= read -r line; do
echo "$line"
done < file.txt
Text Transformation
Convert Case
# Uppercase
tr '[:lower:]' '[:upper:]' < file.txt
# Lowercase
tr '[:upper:]' '[:lower:]' < file.txt
Remove Duplicates
sort file.txt | uniq
Count Occurrences
sort file.txt | uniq -c | sort -rn
Sort
# Sort alphabetically
sort file.txt
# Sort numerically
sort -n file.txt
# Reverse sort
sort -r file.txt
Best Practices
1. Pipeline Efficiently
# Good (pipeline processes one item at a time)
cat large_file.txt | grep pattern | awk '{print $1}'
# Better (avoid useless cat)
grep pattern large_file.txt | awk '{print $1}'
2. Quote Patterns
# Good
grep "$search_term" file.txt
# Can fail if $search_term contains special chars
grep $search_term file.txt
3. Use Right Tool
- grep - Find lines matching pattern
- sed - Stream editing, line-by-line replacement
- awk - Field processing and calculations
- while read - Complex per-line logic
4. Handle Large Files
# For large files, avoid loading entirely into memory
while IFS= read -r line; do
# Process one line at a time
done < huge_file.txt
Frequently Asked Questions
Q: Should I use grep or sed for filtering?
A: grep for finding matching lines, sed for complex transformations.
Q: How do I handle special characters in sed?
A: Escape them: sed 's/\$/dollar/' to match literal $.
Q: What’s the performance difference?
A: grep/sed/awk are C programs (fast). Bash loops are slower but more flexible.
Q: Can I combine grep and sed?
A: Yes: grep pattern file.txt | sed 's/old/new/g'
Next Steps
Explore related topics:
- Bash Regular Expressions - Pattern matching in depth
- Sed Complete Guide - Stream editor reference
- Awk Complete Guide - Field processing reference