Skip to main content

How to Process Fields with Awk

• 2 min read
bash awk field processing data extraction text processing

Quick Answer: Process Fields with Awk

To extract fields, use awk '{print $1}' file.txt where $1 is the first field. For CSV files, specify delimiter: awk -F',' '{print $2}' file.csv. Fields are automatically split by whitespace by default; use -F to change the delimiter.

Quick Comparison: Field Processing Methods

MethodDelimiterBest ForSpeed
awk defaultWhitespaceMost filesFast
awk -FCustomCSV/delimitedFast
cut -fAnySimple extractionFastest
Parameter expansionN/AVariables onlyFastest

Bottom line: Use awk for flexibility; use cut when delimiter is consistent.


Process and extract fields from structured data using awk. Learn field access, manipulation, and custom delimiters.

Method 1: Basic Field Access

# Print specific field (first field)
awk '{print $1}' file.txt

# Print multiple fields
awk '{print $1, $3}' file.txt

# Print all fields
awk '{print}' file.txt

# Print last field
awk '{print $NF}' file.txt

# Print second-to-last field
awk '{print $(NF-1)}' file.txt

Detailed Example

Test file (users.txt):

1 John Doe Developer
2 Jane Smith Manager
3 Bob Wilson Engineer
# Print first and third fields (ID and Job)
awk '{print $1, $3}' users.txt

# Output:
# 1 Doe
# 2 Smith
# 3 Wilson

# Print all except first field
awk '{$1=""; print}' users.txt

# Output:
#  John Doe Developer
#  Jane Smith Manager
#  Bob Wilson Engineer

Set Field Separator

# Use comma as field separator (CSV)
awk -F"," '{print $1, $2}' file.csv

# Use colon as separator
awk -F":" '{print $1}' /etc/passwd

# Use whitespace (default)
awk -F' ' '{print $1}' file.txt

# Use regex as separator
awk -F'[ ,]' '{print $1}' mixed.txt

Modify Fields

# Change a field value
awk '{$2="MODIFIED"; print}' file.txt

# Swap fields
awk '{temp=$1; $1=$2; $2=temp; print}' file.txt

# Add to field
awk '{$2=$2+"10"; print}' file.txt

# Convert to uppercase
awk '{$2=toupper($2); print}' file.txt

Practical Example: Process CSV

#!/bin/bash

# File: process_csv.sh

csv_file="$1"

echo "=== User Report ==="
echo "ID | Name | Department"
echo "---|------|----------"

awk -F"," 'NR>1 {
  id=$1
  name=$2
  dept=$3
  printf "%2s | %-15s | %s\n", id, name, dept
}' "$csv_file"

Input (users.csv):

ID,Name,Department
1,John Doe,Engineering
2,Jane Smith,HR
3,Bob Wilson,Sales

Output:

=== User Report ===
ID | Name | Department
---|------|----------
 1 | John Doe        | Engineering
 2 | Jane Smith      | HR
 3 | Bob Wilson      | Sales

Conditional Field Processing

# Print only if field matches condition
awk '$2 > 100' data.txt        # Second field > 100
awk '$3 ~ /pattern/' file.txt  # Third field contains pattern
awk '$1 == "value"' file.txt   # First field equals value

# Print if field is empty
awk '$2 == ""' file.txt

Count Fields

# Number of fields in line
awk '{print NF}' file.txt

# Print line with field count
awk '{print NF, $0}' file.txt

# Print only lines with specific field count
awk 'NF==3' file.txt

Calculate on Fields

# Sum values in field
awk '{sum += $2} END {print sum}' data.txt

# Average of field
awk '{sum += $2; count++} END {print sum/count}' data.txt

# Multiply fields
awk '{print $2 * $3}' data.txt

Join Fields

# Concatenate fields
awk '{print $1 $2 $3}' file.txt

# Join with separator
awk '{print $1 "-" $2 "-" $3}' file.txt

# Join all fields
awk '{print $1" "$2" "$3" "$4}' file.txt

Extract Substring from Field

# Get first N characters of field
awk '{print substr($1, 1, 3)}' file.txt

# Get last N characters
awk '{print substr($1, length($1)-2)}' file.txt

# Extract from position to end
awk '{print substr($1, 5)}' file.txt

Practical Example: Log Analysis

#!/bin/bash

# File: analyze_access_log.sh

log_file="$1"

echo "=== Access Log Analysis ==="
echo ""

# Extract and count requests by IP
echo "Top 10 IPs:"
awk '{print $1}' "$log_file" | sort | uniq -c | sort -rn | head -10

echo ""
echo "=== Status Codes ==="
awk '{print $9}' "$log_file" | sort | uniq -c | sort -rn

echo ""
echo "=== Largest Responses (bytes) ==="
awk '{print $10, $1}' "$log_file" | sort -rn | head -5

Field Assignment

# Create new field by assigning
awk '{$4=$2*$3; print}' file.txt

# Add computed field
awk '{print $0, $2*$3}' file.txt

# Change field separator on output
awk -F":" -v OFS="," '{print $1, $2, $3}' file.txt

Remove Duplicate Fields

# Remove duplicate columns
awk '{seen=0; for(i=1;i<=NF;i++) if (!seen[$i]++) print $i}' file.txt

Trim Fields

# Remove leading/trailing spaces
awk '{for(i=1;i<=NF;i++) gsub(/^[ \t]+|[ \t]+$/, "", $i); print}' file.txt

# Remove all spaces
awk '{gsub(/ /, ""); print}' file.txt

Process Fields in Loop

# Loop through all fields
awk '{for(i=1; i<=NF; i++) print $i}' file.txt

# Process each field
awk '{for(i=1; i<=NF; i++) $i=toupper($i); print}' file.txt

Output Field Separator

# Change output separator
awk -v OFS=":" '{print $1, $2, $3}' file.txt

# Default output uses space
awk '{print $1, $2}' file.txt         # Space separated
awk -v OFS="," '{print $1, $2}' file.txt  # Comma separated

Complex Field Processing

#!/bin/bash

# File: salary_report.sh

data="$1"

awk -F"," '
NR==1 {next}  # Skip header
{
  name=$1
  salary=$2
  bonus=$3

  total = salary + bonus
  tax = total * 0.15
  net = total - tax

  printf "%-15s Base: $%8.2f Bonus: $%8.2f Total: $%8.2f Net: $%8.2f\n", \
         name, salary, bonus, total, net
}' "$data"

Common Mistakes

  1. Forgetting -F flag - default is whitespace, not comma for CSV
  2. Using string concatenation - $1$2 vs $1 $2 (space matters)
  3. Off-by-one errors - fields are 1-indexed, not 0-indexed
  4. Not quoting awk scripts - use single quotes to prevent expansion
  5. Modifying $0 - it doesn’t auto-rebuild from fields

Performance Tips

  • Use -F to set separator upfront
  • Minimize string operations in loops
  • Use built-in functions instead of shell pipes
  • Cache field references if used multiple times

Key Points

  • Use $1, $2, $3 for field access (1-indexed)
  • Use NF for field count
  • Use $NF for last field
  • Use -F to set field separator
  • Modify fields by assignment

Summary

Awk is powerful for field-based processing. Master field access, separators, and basic calculations. Use loops for complex field manipulation and always quote your scripts to prevent shell expansion.