Skip to main content

How to Filter with Awk

β€’ 2 min read
bash awk filtering conditional data processing field filtering

Quick Answer: Filter with Awk Conditions

To filter lines conditionally with awk, use: awk '$2 > 100' data.txt to print lines where field 2 is greater than 100. For text matching, use awk '/pattern/' to match pattern. Combine multiple conditions with && and ||.

Quick Comparison: Awk Filtering Methods

SyntaxCondition TypeBest ForSpeed
awk β€˜$2 > 100’NumericField comparisonsFast
awk β€˜/pattern/β€˜String matchPattern matchingFast
awk β€˜$1 == value’EqualityExact matchesFast
awk β€˜NR > 5’Line numberRange filteringFast

Bottom line: Use field conditions for numeric; use patterns for text matching.


Use conditional statements in awk to filter and process data based on patterns and field values. Learn if/else, comparison, and logical operators.

Method 1: Basic Conditions

# Print lines where field > value
awk '$2 > 100' data.txt

# Print lines where field equals value
awk '$1 == "John"' data.txt

# Print lines containing pattern
awk '/pattern/' data.txt

# Print lines where field matches regex
awk '$2 ~ /pattern/' data.txt

Detailed Example

Test file (sales.txt):

John 150 Active
Jane 200 Inactive
Bob 75 Active
Alice 320 Active
# Print sales > 100
awk '$2 > 100' sales.txt

# Output:
# John 150 Active
# Jane 200 Inactive
# Alice 320 Active

# Print only active sales
awk '$3 == "Active"' sales.txt

# Output:
# John 150 Active
# Bob 75 Active
# Alice 320 Active

Comparison Operators

OperatorMeaning
==Equal
!=Not equal
<Less than
>Greater than
<=Less or equal
>=Greater or equal
~Match regex
!~Not match regex

Logical Operators

# AND condition
awk '$2 > 100 && $3 == "Active"' data.txt

# OR condition
awk '$1 == "John" || $1 == "Jane"' data.txt

# NOT condition
awk '!($2 < 100)' data.txt

If-Else Statements

# Basic if
awk '$2 > 100 {print "High"; next} {print "Low"}' data.txt

# If-else block
awk '{
  if ($2 > 100)
    print $1 " has high value"
  else
    print $1 " has low value"
}' data.txt

# If-else if-else
awk '{
  if ($2 > 200)
    print "Very High"
  else if ($2 > 100)
    print "High"
  else
    print "Low"
}' data.txt

Pattern Matching Conditions

# Match regex
awk '/ERROR/ {print}' logfile.txt

# Partial match
awk '$1 ~ /^[0-9]/ {print}' file.txt  # First field starts with digit

# Not matching
awk '$2 !~ /debug/ {print}' file.txt  # Second field doesn't contain "debug"

# Case-insensitive match
awk 'tolower($1) ~ /john/ {print}' file.txt

String Comparisons

# String equality
awk '$1 == "apple"' file.txt

# String inequality
awk '$1 != "apple"' file.txt

# Length comparison
awk 'length($1) > 5' file.txt

Practical Example: Data Classification

#!/bin/bash

# File: classify_data.sh

data="$1"

awk '{
  score = $2
  status = $3

  if (score >= 90) {
    grade = "A"
  } else if (score >= 80) {
    grade = "B"
  } else if (score >= 70) {
    grade = "C"
  } else {
    grade = "F"
  }

  if (status == "Active") {
    result = "PROCESS"
  } else {
    result = "SKIP"
  }

  printf "%-15s Score: %3d Grade: %s Action: %s\n", $1, score, grade, result
}' "$data"

Input:

John 95 Active
Jane 75 Inactive
Bob 85 Active

Output:

John            Score:  95 Grade: A Action: PROCESS
Jane            Score:  75 Grade: C Action: SKIP
Bob             Score:  85 Grade: B Action: PROCESS

Multiple Conditions

# All conditions must be true
awk '$2 > 100 && $3 == "Active" && $4 != "error"' file.txt

# At least one condition true
awk '$1 == "error" || $1 == "warning" || $1 == "critical"' file.txt

# Complex condition with parentheses
awk '($2 > 100 || $2 < 50) && $3 == "Active"' file.txt

Filtering on Field Range

# Value between two numbers
awk '$2 >= 50 && $2 <= 150' data.txt

# Not between range
awk '!($2 >= 50 && $2 <= 150)' data.txt

Practical Example: Log Filter

#!/bin/bash

# File: filter_logs.sh

logfile="$1"

awk '
/ERROR/ && NF >= 5 {
  time = $1
  level = $2
  code = $3
  message = $4 " " $5

  if (code >= 500) {
    severity = "CRITICAL"
  } else if (code >= 400) {
    severity = "ERROR"
  } else {
    severity = "WARNING"
  }

  printf "[%s] %s: %s %s\n", time, severity, code, message
}
' "$logfile"

Conditional with Regex

# Match pattern AND other condition
awk '/ERROR/ && $2 > 10' file.txt

# Pattern OR number condition
awk '/critical/ || $3 > 100' file.txt

# Complex: Pattern match and field range
awk '$1 ~ /^[0-9]/ && $2 >= 100 && $2 <= 200' file.txt

Testing for Empty Fields

# Field is empty
awk '$3 == ""' file.txt

# Field is not empty
awk '$3 != ""' file.txt

# Field has length > 0
awk 'length($3) > 0' file.txt

Ternary Operator

# Conditional expression
awk '{print ($2 > 100) ? "High" : "Low"}' data.txt

# More complex
awk '{print $1, ($2 > 100 && $3 == "Active") ? "PROCESS" : "SKIP"}' data.txt

Practical Example: Performance Report

#!/bin/bash

# File: performance_report.sh

data="$1"

awk '
NR > 1 {
  name = $1
  response = $2
  errors = $3
  uptime = $4

  # Determine status
  if (response < 100 && errors == 0 && uptime > 99.5) {
    status = "βœ“ GOOD"
  } else if (response < 200 && errors < 5) {
    status = "⚠ FAIR"
  } else {
    status = "βœ— POOR"
  }

  printf "%-10s Response: %4dms Errors: %2d Uptime: %.1f%% Status: %s\n", \
    name, response, errors, uptime, status
}
' "$data"

Filtering Without Action

# Just print lines matching condition (no action)
awk '$2 > 100' data.txt

# Without 'print', entire line is printed if condition is true
# Same as: awk '$2 > 100 {print}' data.txt

Negation

# NOT operator
awk '!($2 > 100)' data.txt          # Not greater than 100
awk '!($1 ~ /error/)' file.txt      # Not containing error
awk '!/pattern/' file.txt            # Lines NOT matching pattern

Common Mistakes

  1. Using = instead of == - assignment vs comparison
  2. Not quoting strings - string values need quotes: "value"
  3. Forgetting curly braces - if-else blocks need them
  4. Regex without ~ - use ~ for pattern matching: $1 ~ /pat/
  5. Logical operator confusion - use && for AND, || for OR

Performance Tips

  • Use simple conditions when possible
  • Pattern matching is fast with /regex/
  • Numeric comparisons faster than string
  • Avoid complex nested conditions
  • Use next to skip remaining actions

Key Points

  • Use $field comparison_operator value
  • Use ~ for regex matching
  • Combine with && (AND) and || (OR)
  • If-else blocks need curly braces
  • Ternary operator useful for simple cases

Summary

Conditional filtering in awk is powerful for data processing. Master comparison operators, logical combinations, and regex matching. Use if-else for complex logic and ternary operators for simple conditions.