Skip to main content

How to Extract Column from CSV in Bash

• 3 min read
bash csv data extraction text processing column extraction cut awk

Quick Answer: Extract Column from CSV in Bash

To extract a specific column from a CSV file, use cut -d',' -f2 file.csv where -d sets the delimiter and -f specifies the field number. For more complex operations, use awk -F',' '{print $2}' file.csv. The cut command is fastest for simple extraction.

Quick Comparison: CSV Column Extraction Methods

MethodSpeedComplexityBest ForFlexibility
cut commandFastestVery simpleBasic column extractionLimited
awkVery fastModerateComplex filtering/transformsHigh
sedFastModeratePattern-based selectionMedium
Parameter expansionN/AComplexScripting inside loopsLow
Bash arraysSlowSimpleSmall files, simple logicHigh

Bottom line: Use cut for simple column extraction, use awk for filtering and transformations.


Extract specific columns from CSV files efficiently using various Bash tools. CSV (Comma-Separated Values) files are common in data processing, and you’ll frequently need to extract, transform, and analyze specific columns.

Method 1: Using cut Command (Fastest)

The cut command is simple, fast, and perfect for basic column extraction. It’s built-in and requires no external dependencies:

# Extract column 1 (first column)
cut -d',' -f1 users.csv

# Extract columns 1 and 3
cut -d',' -f1,3 users.csv

# Extract columns 1-3 (range)
cut -d',' -f1-3 users.csv

# Extract all except column 2
cut -d',' -f1,3- users.csv

# Different delimiter
cut -d':' -f1,5 /etc/passwd

When to Use cut

Use cut when:

  • You need to extract columns from structured CSV/delimited data
  • You want the fastest performance
  • The extraction is simple without complex filtering
  • You’re processing large files (cut is efficient)
  • The delimiter is consistent throughout the file

Avoid cut when:

  • You need to filter rows based on column values (use awk instead)
  • Headers need special handling (cut doesn’t have header awareness)
  • You need to transform values (use awk or sed)

Example with sample CSV:

# Input (users.csv):
id,name,email,age
1,John,john@example.com,30
2,Jane,jane@example.com,25
3,Bob,bob@example.com,35

# Extract name and email:
cut -d',' -f2,3 users.csv

# Output:
name,email
John,john@example.com
Jane,jane@example.com
Bob,bob@example.com

Method 2: Using awk Command

awk is powerful for complex CSV processing with transformations.

# Extract first column
awk -F',' '{print $1}' users.csv

# Extract columns 1 and 3
awk -F',' '{print $1, $3}' users.csv

# Extract with custom output separator
awk -F',' '{print $1 "|" $2}' users.csv

# Skip header line
awk -F',' 'NR>1 {print $2}' users.csv

# Extract with conditions
awk -F',' '$4 > 30 {print $2}' users.csv

# Multiple conditions
awk -F',' '$4 > 25 && $1 < 3 {print $2, $4}' users.csv

Example:

# Extract names for people over 30
awk -F',' '$4 > 30 {print $2}' users.csv

# Output:
John
Bob

Method 3: Using read with IFS

Read the CSV line by line and extract specific fields.

# Read and extract specific columns
while IFS=',' read -r id name email age; do
  echo "Name: $name, Age: $age"
done < users.csv

# Read and process selectively
while IFS=',' read -r id name email age; do
  if [ "$age" -gt 25 ]; then
    echo "Adult: $name"
  fi
done < users.csv

Example:

while IFS=',' read -r id name email age; do
  [ "$id" = "id" ] && continue  # Skip header
  echo "$name is $age years old"
done < users.csv

# Output:
John is 30 years old
Jane is 25 years old
Bob is 35 years old

Method 4: Using tr and cut

Combine tools for specific transformations.

# Change output format from comma to pipe
cut -d',' -f1,3 users.csv | tr ',' '|'

# Convert to different format
awk -F',' '{print $2 ":" $3}' users.csv

Handling Quoted CSV Fields

CSV files with quoted fields require special handling.

# Remove quotes from specific field
awk -F',' '{gsub(/"/, "", $2); print $2}' data.csv

# Handle fields with embedded commas
awk -F'"' '{print $2}' data.csv

# More robust CSV parsing with awk
awk -F',' '{
  for(i=1; i<=NF; i++) {
    gsub(/^"/, "", $i)
    gsub(/"$/, "", $i)
  }
  print $1, $3
}' data.csv

Practical Examples

Example 1: Extract and Filter Columns

#!/bin/bash

csv_file="$1"

if [ ! -f "$csv_file" ]; then
  echo "File not found"
  exit 1
fi

# Extract email addresses for active users
echo "Active users:"
awk -F',' '$5 == "active" {print $2 ": " $3}' "$csv_file"

Input (users.csv):

id,name,email,department,status
1,John Doe,john@example.com,IT,active
2,Jane Smith,jane@example.com,HR,active
3,Bob Johnson,bob@example.com,Sales,inactive

Output:

Active users:
John Doe: john@example.com
Jane Smith: jane@example.com

Example 2: Extract Multiple Columns with Formatting

#!/bin/bash

csv_file="$1"

# Extract and format specific columns
echo "Employee Report:"
awk -F',' 'NR==1 {print "Name | Email | Dept"}
           NR>1 {printf "%-20s | %-25s | %-10s\n", $2, $3, $4}' "$csv_file"

Output:

Employee Report:
Name | Email | Dept
John Doe | john@example.com | IT
Jane Smith | jane@example.com | HR
Bob Johnson | bob@example.com | Sales

Example 3: Extract Column to New CSV

#!/bin/bash

input_csv="$1"
output_csv="$2"
column=$3

if [ -z "$column" ]; then
  echo "Usage: $0 input.csv output.csv column_number"
  exit 1
fi

# Extract specific column to new file
cut -d',' -f"$column" "$input_csv" > "$output_csv"

echo "Column $column extracted to $output_csv"

Usage:

bash script.sh users.csv emails.csv 3
# Extracts column 3 (email) to emails.csv

Example 4: Extract with Statistics

#!/bin/bash

csv_file="$1"

# Extract ages and calculate statistics
echo "Age Statistics:"
awk -F',' 'NR>1 {
  sum += $4
  count++
  if ($4 > max || max == "") max = $4
  if ($4 < min || min == "") min = $4
}
END {
  avg = sum / count
  printf "Count: %d\n", count
  printf "Average: %.2f\n", avg
  printf "Min: %d\n", min
  printf "Max: %d\n", max
}' "$csv_file"

Example 5: Extract and Merge Columns

#!/bin/bash

csv_file="$1"

# Extract first and last name, merge into single column
awk -F',' 'NR>1 {
  split($2, name, " ")
  print name[2] ", " name[1] " <" $3 ">"
}' "$csv_file"

Input:

id,name,email
1,John Doe,john@example.com
2,Jane Smith,jane@example.com

Output:

Doe, John <john@example.com>
Smith, Jane <jane@example.com>

Example 6: Extract with Dynamic Column Selection

#!/bin/bash

# Function to extract columns by name
extract_by_name() {
  local csv_file="$1"
  local col_names="$2"  # Space-separated column names

  # Get header
  header=$(head -1 "$csv_file")

  # Find column numbers
  local col_nums=()
  IFS=',' read -ra cols <<< "$header"

  for name in $col_names; do
    for i in "${!cols[@]}"; do
      if [ "${cols[$i]}" = "$name" ]; then
        col_nums+=("$((i+1))")
      fi
    done
  done

  # Extract columns
  local fields=$(IFS=,; echo "${col_nums[*]}")
  cut -d',' -f"$fields" "$csv_file"
}

# Usage
extract_by_name "users.csv" "name email"

Example 7: Convert CSV Column to List

#!/bin/bash

csv_file="$1"
column_num="${2:-1}"

# Extract column and create comma-separated list
echo -n "Values: "
cut -d',' -f"$column_num" "$csv_file" | \
  awk 'NR>1 {printf "%s%s", sep, $0; sep=", "} END {print ""}'

# Example output: Values: John Doe, Jane Smith, Bob Johnson

Performance Comparison

For extracting columns from large CSV files:

MethodSpeedFlexibilityBest For
cutFastestLowSimple column extraction
awkVery FastHighComplex processing
read loopMediumHighSingle-row operations

Best choice: Use cut for simple extractions, awk for complex operations.

Important Considerations

CSV with Quoted Fields

Properly handle CSV with embedded commas:

# Python's CSV module would be better, but in bash:
# Option 1: Use specialized CSV parsers
csvcut -c 1,3 users.csv

# Option 2: Use robust awk script
gawk -v FPAT='([^,]+)|(\"[^\"]+\")' '{
  gsub(/"/, "", $2)
  print $1, $2
}' data.csv

Different Delimiters

CSV files may use different delimiters:

# Tab-separated values
awk -F'\t' '{print $2}' data.tsv

# Pipe-separated
cut -d'|' -f2 data.txt

# Colon-separated (like /etc/passwd)
cut -d':' -f1,5 /etc/passwd

Headers

Handle CSV headers properly:

# Skip header with cut (no built-in support)
tail -n +2 users.csv | cut -d',' -f1,3

# Skip header with awk
awk -F',' 'NR>1 {print $1, $3}' users.csv

Key Points

  • Use cut for simple, fast column extraction
  • Use awk for complex filtering and transformations
  • Use read for line-by-line processing with variables
  • Always specify -F',' or -d',' for comma delimiter
  • Handle quoted CSV fields properly
  • Skip headers when needed with NR>1
  • Test on sample data first

Quick Reference

# Extract single column
cut -d',' -f1 file.csv
awk -F',' '{print $1}' file.csv

# Extract multiple columns
cut -d',' -f1,3,5 file.csv
awk -F',' '{print $1, $3, $5}' file.csv

# Extract with filtering
awk -F',' '$2 == "John" {print $3}' file.csv

# Extract and skip header
cut -d',' -f2 file.csv | tail -n +2
awk -F',' 'NR>1 {print $2}' file.csv
#!/bin/bash

csv_file="$1"

# For simple extraction use cut:
cut -d',' -f1,3 "$csv_file"

# For complex processing use awk:
awk -F',' 'NR>1 && $4 > 25 {print $2, $3}' "$csv_file"

# For line-by-line with variables:
while IFS=',' read -r col1 col2 col3; do
  [ "$col1" = "id" ] && continue
  echo "Processing: $col1"
done < "$csv_file"