How to Parse CSV Files in Bash
Quick Answer: How to Parse CSV Files
Use a while loop with IFS: while IFS=',' read -r id name email; do process "$id" "$name" "$email"; done < file.csv. This reads each line, splits by comma, and stores fields in variables.
Quick Comparison: CSV Parsing Methods
| Method | Speed | Flexibility | Best For |
|---|---|---|---|
| read + IFS | Very fast | High | Most CSV files |
| awk | Fastest | Very high | Complex operations |
| cut | Fast | Low | Simple column extraction |
| Manual loops | Medium | Low | Learning/debugging |
Bottom line: Use read + IFS for clarity, awk for complex processing or performance.
Parse CSV (Comma-Separated Values) files efficiently in Bash. CSV parsing is essential for data processing, ETL workflows, and working with spreadsheet exports. This tutorial covers multiple methods from simple to advanced.
Method 1: Using read with IFS (Recommended)
This is the most straightforward and flexible method. IFS stands for “Internal Field Separator”—it tells Bash how to split each line into fields. By setting IFS to a comma, you’re saying “split on commas.” The read command then stores each field in a variable you specify.
# Basic parsing
while IFS=',' read -r id name email; do
echo "ID: $id, Name: $name, Email: $email"
done < users.csv
# Read with explicit field assignment
while IFS=',' read -r id name email age; do
echo "User: $name (Age: $age)"
done < users.csv
The -r flag prevents backslash interpretation—important because filenames and data might contain backslashes. The < users.csv redirects the file’s contents to the while loop’s stdin. For every line in the CSV, the loop runs once.
Example with sample CSV:
# Input (users.csv):
id,name,email,age
1,John Smith,john@example.com,30
2,Jane Doe,jane@example.com,25
3,Bob Johnson,bob@example.com,35
# Command:
while IFS=',' read -r id name email age; do
[ "$id" = "id" ] && continue # Skip header
echo "$name is $age years old"
done < users.csv
# Output:
John Smith is 30 years old
Jane Doe is 25 years old
Bob Johnson is 35 years old
The [ "$id" = "id" ] && continue line skips the header row. You can check any field, but the first field (id) is typical. This method is intuitive and performs well for most CSV files. Each field automatically goes into a named variable you can reference naturally in your code.
When to Use read + IFS
Use this method when:
- You want readable, straightforward code
- CSV files are moderately sized
- You need to process each row with Bash logic
- You prefer explicit variable names over field numbers
Method 2: Skip Header Line
Handle CSV files that have header rows.
# Skip header explicitly
while IFS=',' read -r id name email; do
[ "$id" = "id" ] && continue # Skip if first field is "id"
echo "Processing: $name"
done < users.csv
# Or use tail to skip first line
while IFS=',' read -r id name email; do
echo "User: $name"
done < <(tail -n +2 users.csv)
# Or use NR>1 with awk then parse
awk -F',' 'NR>1 {print $2}' users.csv | while read name; do
echo "Processing: $name"
done
Method 3: Using awk
awk is powerful for complex CSV operations with field separators.
# Basic field extraction
awk -F',' '{print $1, $2, $3}' users.csv
# Skip header and process specific fields
awk -F',' 'NR>1 {print $2, $3}' users.csv
# Conditional processing
awk -F',' '$4 > 30 {print $2, $4}' users.csv
# Format output
awk -F',' 'NR>1 {printf "%s (%s)\n", $2, $3}' users.csv
# Multiple conditions
awk -F',' 'NR>1 && $4 > 25 {print $2}' users.csv
Method 4: Using cut Command
Simple extraction of specific columns.
# Extract columns 1 and 3
cut -d',' -f1,3 users.csv
# Extract range of columns
cut -d',' -f1-3 users.csv
# Extract all except column 2
cut -d',' -f1,3- users.csv
Handling Quoted CSV Fields
CSV files often have quoted fields that may contain commas.
# Remove quotes from all fields
awk -F',' '{gsub(/"/, ""); print}' data.csv
# Remove quotes from specific field
awk -F',' '{gsub(/"/, "", $2); print $2}' data.csv
# More robust parsing for quoted fields
awk -v FPAT='([^,]+)|(\"[^\"]+\")' '{
gsub(/"/, "", $2) # Remove quotes from field 2
print $1, $2, $3
}' data.csv
Example:
# Input with quoted fields:
1,"Smith, John",john@example.com
2,"Doe, Jane",jane@example.com
# Parse with FPAT (Field Pattern):
awk -v FPAT='([^,]+)|(\"[^\"]+\")' -F',' '{
gsub(/"/, "", $2)
print $1 ": " $2
}' data.csv
# Output:
1: Smith, John
2: Doe, Jane
Practical Examples
Example 1: Parse and Validate CSV
#!/bin/bash
csv_file="$1"
# Validate CSV format
if [ ! -f "$csv_file" ]; then
echo "File not found"
exit 1
fi
# Check header
header=$(head -1 "$csv_file")
if [[ "$header" != "id,name,email,age" ]]; then
echo "Error: Invalid CSV format"
exit 1
fi
# Parse data
echo "Parsing CSV..."
while IFS=',' read -r id name email age; do
[ "$id" = "id" ] && continue
# Validate fields
if [ -z "$id" ] || [ -z "$name" ]; then
echo "Warning: Invalid row - $id,$name"
continue
fi
echo "[$id] $name ($age)"
done < "$csv_file"
Output:
Parsing CSV...
[1] John Smith (30)
[2] Jane Doe (25)
[3] Bob Johnson (35)
Example 2: Transform CSV Data
#!/bin/bash
input_csv="$1"
output_csv="${input_csv%.csv}_transformed.csv"
# Read input, transform, write output
echo "name,email,age_category" > "$output_csv"
while IFS=',' read -r id name email age; do
[ "$id" = "id" ] && continue
# Categorize age
if [ "$age" -lt 20 ]; then
category="Teen"
elif [ "$age" -lt 30 ]; then
category="Young Adult"
elif [ "$age" -lt 60 ]; then
category="Adult"
else
category="Senior"
fi
echo "$name,$email,$category" >> "$output_csv"
done < "$input_csv"
echo "Transformed CSV: $output_csv"
Example 3: Parse and Filter CSV
#!/bin/bash
csv_file="$1"
filter_field="${2:-age}"
filter_value="${3:-30}"
# Parse and filter based on field value
awk -F',' -v field="$filter_field" -v value="$filter_value" '
NR==1 {
for (i=1; i<=NF; i++) {
if ($i == field) col = i
}
print
next
}
$col > value { print }
' "$csv_file"
Usage:
# Find users older than 25
bash script.sh users.csv age 25
Example 4: CSV to Formatted Report
#!/bin/bash
csv_file="$1"
# Generate formatted report from CSV
echo "==== USER REPORT ===="
printf "%-5s %-20s %-25s %-5s\n" "ID" "Name" "Email" "Age"
echo "=================================================="
while IFS=',' read -r id name email age; do
[ "$id" = "id" ] && continue
printf "%-5s %-20s %-25s %-5s\n" "$id" "$name" "$email" "$age"
done < "$csv_file"
Output:
==== USER REPORT ====
ID Name Email Age
==================================================
1 John Smith john@example.com 30
2 Jane Doe jane@example.com 25
3 Bob Johnson bob@example.com 35
Example 5: Parse and Calculate Statistics
#!/bin/bash
csv_file="$1"
# Calculate statistics from CSV
echo "Calculating statistics from: $csv_file"
awk -F',' 'NR>1 {
sum += $4
count++
if ($4 > max || max == "") max = $4
if ($4 < min || min == "") min = $4
if ($4 < 25) young++
else if ($4 < 60) adult++
else senior++
}
END {
printf "Total records: %d\n", count
printf "Average age: %.2f\n", sum/count
printf "Min age: %d\n", min
printf "Max age: %d\n", max
printf "Young (<25): %d\n", young
printf "Adult (25-59): %d\n", adult
printf "Senior (60+): %d\n", senior
}' "$csv_file"
Output:
Calculating statistics from: users.csv
Total records: 3
Average age: 30.00
Min age: 25
Max age: 35
Young (<25): 0
Adult (25-59): 3
Senior (60+): 0
Example 6: Merge Multiple CSV Files
#!/bin/bash
# Merge multiple CSV files keeping header from first
output_file="merged.csv"
# Get header from first file
head -1 "$1" > "$output_file"
# Append data from all files (skip their headers)
for csv in "$@"; do
tail -n +2 "$csv" >> "$output_file"
done
echo "Merged CSVs into: $output_file"
Example 7: CSV Parsing Function
#!/bin/bash
# Reusable CSV parsing function
parse_csv() {
local csv_file="$1"
local callback="$2" # Function to call for each row
if [ ! -f "$csv_file" ]; then
echo "Error: File not found"
return 1
fi
while IFS=',' read -r -a fields; do
# Skip header (first row)
if [ "$LINENO" = "2" ]; then
continue
fi
# Call callback function with fields
$callback "${fields[@]}"
done < "$csv_file"
}
# Define callback function
process_user() {
local id="$1"
local name="$2"
local email="$3"
local age="$4"
echo "User $id: $name ($age years old)"
}
# Usage
parse_csv "users.csv" "process_user"
Performance Comparison
For parsing CSV files:
| Method | Speed | Flexibility |
|---|---|---|
| read + IFS | Very Fast | High |
| awk | Fastest | High |
| cut | Fast | Low |
Best choice: Use read + IFS for clarity, awk for performance.
Important Considerations
Special Characters and Escaping
Handle special characters properly:
# File might have special chars
while IFS=',' read -r id name email; do
# Escape for safe usage in commands
safe_name=$(printf '%q' "$name")
echo "$safe_name"
done < users.csv
Different Delimiters
CSV might use different delimiters:
# Tab-separated
while IFS=$'\t' read -r id name email; do
echo "$name"
done < users.tsv
# Semicolon-separated
while IFS=';' read -r id name email; do
echo "$name"
done < users.csv
Handling Large Files
For very large CSVs, consider memory usage:
# Process one line at a time (memory efficient)
while IFS=',' read -r id name email; do
# Process immediately
process_record "$id" "$name" "$email"
done < huge_file.csv
Key Points
- Use
IFS=','to split CSV by comma - Use
read -rto prevent backslash interpretation - Always skip headers appropriately
- Quote filenames and variables:
< "$csv_file" - Use awk for complex processing
- Handle quoted fields with FPAT in awk
- Test with sample data first
Quick Reference
# Basic parsing
while IFS=',' read -r f1 f2 f3; do
echo "$f1 $f2 $f3"
done < file.csv
# Skip header
while IFS=',' read -r f1 f2 f3; do
[ "$f1" = "id" ] && continue
echo "$f1"
done < file.csv
# Using tail to skip header
while IFS=',' read -r f1 f2 f3; do
echo "$f1"
done < <(tail -n +2 file.csv)
# Using awk for complex operations
awk -F',' 'NR>1 {print $2}' file.csv
Recommended Pattern
#!/bin/bash
csv_file="$1"
# For straightforward parsing:
while IFS=',' read -r id name email age; do
[ "$id" = "id" ] && continue # Skip header
echo "Processing: $name"
done < "$csv_file"
# For complex operations:
awk -F',' 'NR>1 && $4 > 25 {print $2, $3}' "$csv_file"