Skip to main content

How to Capture Groups in Bash

• 2 min read
bash regex capture groups pattern matching extraction BASH_REMATCH

Quick Answer: Capture Groups in Bash

To capture groups, use parentheses in regex: [[ "$string" =~ (group1)(group2) ]]. After matching, access groups via ${BASH_REMATCH[1]}, ${BASH_REMATCH[2]}, etc. The full match is in ${BASH_REMATCH[0]}.

Quick Comparison: Group Extraction Methods

MethodSyntaxBest ForComplexity
Capture groups(pattern) with BASH_REMATCHExtracting partsSimple
sed -n ‘s//&/p’sed 's/\(.*\)/\1/'File processingModerate
awk substrawk '{print substr($0, pos, len)}'Field extractionModerate
grep -ogrep -o 'pattern'Pattern onlySimple

Bottom line: Use capture groups with BASH_REMATCH for maximum simplicity in Bash.


Learn how to extract matched pattern groups.

Overview

Regular expression capture groups allow you to extract specific portions of matched text. When working with complex text processing tasks, capture groups let you isolate and reuse parts of a pattern match. This is essential for parsing logs, extracting data from formatted strings, and validating input formats.

Basic Syntax

Capture groups use parentheses to define what you want to extract:

# Simple capture group
if [[ $string =~ (pattern) ]]; then
  captured="${BASH_REMATCH[1]}"
fi

# Multiple capture groups
if [[ $string =~ (pattern1)(pattern2) ]]; then
  first="${BASH_REMATCH[1]}"
  second="${BASH_REMATCH[2]}"
fi

# BASH_REMATCH array:
# [0] = entire matched string
# [1] = first capture group
# [2] = second capture group, etc.

Example 1: Extract Email Address Parts

#!/bin/bash

email="john.doe@example.com"

# Capture username and domain
if [[ $email =~ ^([^@]+)@(.+)$ ]]; then
  username="${BASH_REMATCH[1]}"
  domain="${BASH_REMATCH[2]}"

  echo "Email: $email"
  echo "Username: $username"
  echo "Domain: $domain"
fi

Output:

Email: john.doe@example.com
Username: john.doe
Domain: example.com

Example 2: Parse Date Format

#!/bin/bash

date_string="2024-12-25"

# Capture year, month, day
if [[ $date_string =~ ^([0-9]{4})-([0-9]{2})-([0-9]{2})$ ]]; then
  year="${BASH_REMATCH[1]}"
  month="${BASH_REMATCH[2]}"
  day="${BASH_REMATCH[3]}"

  echo "Date: $date_string"
  echo "Year: $year"
  echo "Month: $month"
  echo "Day: $day"
fi

Output:

Date: 2024-12-25
Year: 2024
Month: 12
Day: 25

Example 3: Extract Log Information

#!/bin/bash

# Parse Apache-style log line
log_line='192.168.1.1 - - [25/Dec/2024:10:15:30 +0000] "GET /api/users HTTP/1.1" 200 1234'

# Capture IP, request method, path, status code
if [[ $log_line =~ ([0-9.]+).*\"([A-Z]+)\ ([^\ ]+).*\"\ ([0-9]+) ]]; then
  ip="${BASH_REMATCH[1]}"
  method="${BASH_REMATCH[2]}"
  path="${BASH_REMATCH[3]}"
  status="${BASH_REMATCH[4]}"

  echo "IP Address: $ip"
  echo "Method: $method"
  echo "Path: $path"
  echo "Status Code: $status"
fi

Output:

IP Address: 192.168.1.1
Method: GET
Path: /api/users
Status Code: 200

Example 4: Extract Name and Phone Number

#!/bin/bash

contact="John Smith: (555) 123-4567"

# Capture name and phone number
if [[ $contact =~ ([A-Za-z\ ]+):\ \(([0-9]+)\)\ ([0-9]+-[0-9]+) ]]; then
  name="${BASH_REMATCH[1]}"
  area_code="${BASH_REMATCH[2]}"
  phone_number="${BASH_REMATCH[3]}"

  echo "Name: $name"
  echo "Area Code: $area_code"
  echo "Phone: $phone_number"
fi

Output:

Name: John Smith
Area Code: 555
Phone: 123-4567

Example 5: Extract URL Components

#!/bin/bash

url="https://www.example.com:8080/api/users?id=123"

# Capture protocol, domain, port, path
if [[ $url =~ ^([a-z]+)://([^:/]+)(:([0-9]+))?(/[^?]*)(\?(.*))?$ ]]; then
  protocol="${BASH_REMATCH[1]}"
  domain="${BASH_REMATCH[2]}"
  port="${BASH_REMATCH[4]}"
  path="${BASH_REMATCH[5]}"
  query="${BASH_REMATCH[7]}"

  echo "Protocol: $protocol"
  echo "Domain: $domain"
  echo "Port: $port"
  echo "Path: $path"
  echo "Query: $query"
fi

Output:

Protocol: https
Domain: www.example.com
Port: 8080
Path: /api/users
Query: id=123

Example 6: Function to Extract Values

#!/bin/bash

# Reusable function for capture groups
extract_fields() {
  local string="$1"
  local pattern="$2"

  if [[ $string =~ $pattern ]]; then
    # Return all captured groups
    for ((i=1; i<${#BASH_REMATCH[@]}; i++)); do
      echo "${BASH_REMATCH[$i]}"
    done
  else
    return 1
  fi
}

# Usage example
email="alice@company.org"
extract_fields "$email" '^([^@]+)@(.+)$'

# Output:
# alice
# company.org

Practical Examples

Parse Configuration File

#!/bin/bash

config_file="settings.conf"

# Parse lines like: key = value
while IFS= read -r line; do
  [[ $line =~ ^([^=]+)\ *=\ *(.*)$ ]] || continue

  key="${BASH_REMATCH[1]}"
  value="${BASH_REMATCH[2]}"

  echo "Config: $key = $value"
done < "$config_file"

Validate and Extract IP Address

#!/bin/bash

ip="192.168.1.1"

if [[ $ip =~ ^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$ ]]; then
  octet1="${BASH_REMATCH[1]}"
  octet2="${BASH_REMATCH[2]}"
  octet3="${BASH_REMATCH[3]}"
  octet4="${BASH_REMATCH[4]}"

  # Validate octets are 0-255
  for octet in "$octet1" "$octet2" "$octet3" "$octet4"; do
    if [ "$octet" -gt 255 ]; then
      echo "Invalid IP"
      exit 1
    fi
  done

  echo "Valid IP: $ip"
else
  echo "Invalid IP format"
  exit 1
fi

Extract Version Numbers

#!/bin/bash

version_string="Application v2.3.4-beta.1"

if [[ $version_string =~ v([0-9]+)\.([0-9]+)\.([0-9]+)(-(.*))?$ ]]; then
  major="${BASH_REMATCH[1]}"
  minor="${BASH_REMATCH[2]}"
  patch="${BASH_REMATCH[3]}"
  suffix="${BASH_REMATCH[5]}"

  echo "Major: $major"
  echo "Minor: $minor"
  echo "Patch: $patch"
  [ -n "$suffix" ] && echo "Suffix: $suffix"
fi

Common Patterns

Email Address

if [[ $email =~ ^([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$ ]]; then
  username="${BASH_REMATCH[1]}"
  domain="${BASH_REMATCH[2]}"
fi

Phone Number (US Format)

if [[ $phone =~ ^\(?([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})$ ]]; then
  area="${BASH_REMATCH[1]}"
  prefix="${BASH_REMATCH[2]}"
  line="${BASH_REMATCH[3]}"
fi

File Path and Extension

if [[ $file =~ ^(.*/)?([^/]+)\.([^.]+)$ ]]; then
  directory="${BASH_REMATCH[1]}"
  filename="${BASH_REMATCH[2]}"
  extension="${BASH_REMATCH[3]}"
fi

Important Notes

BASH_REMATCH Array

  • Index 0: The entire matched string
  • Index 1+: Each capture group in order
  • Only set when match succeeds with =~ operator
text="Hello123World"
if [[ $text =~ ([A-Za-z]+)([0-9]+)([A-Za-z]+) ]]; then
  echo "Full match: ${BASH_REMATCH[0]}"     # Hello123World
  echo "Group 1: ${BASH_REMATCH[1]}"        # Hello
  echo "Group 2: ${BASH_REMATCH[2]}"        # 123
  echo "Group 3: ${BASH_REMATCH[3]}"        # World
fi

Non-Capturing Groups

In some contexts, you might want (?:pattern) for non-capturing groups, but basic Bash regex doesn’t support them. Use capturing groups and ignore unwanted ones.

Performance Considerations

  • Capture groups are fast in Bash 4+
  • Use simple patterns for better performance
  • Avoid excessive backtracking in regex patterns
  • Consider using simpler tools (cut, awk) for simple string operations

Key Points

  • Use [[ string =~ pattern ]] for regex matching
  • Access captures via ${BASH_REMATCH[n]} array
  • Index 0 is the entire match, 1+ are capture groups
  • Always check if match succeeded before accessing BASH_REMATCH
  • Parentheses define capture groups
  • Test regex patterns carefully for edge cases

Quick Reference

# Basic capture group
if [[ $string =~ (pattern) ]]; then
  captured="${BASH_REMATCH[1]}"
fi

# Multiple groups
if [[ $string =~ (group1)(group2) ]]; then
  first="${BASH_REMATCH[1]}"
  second="${BASH_REMATCH[2]}"
fi

# Parse simple key=value
if [[ $line =~ ^([^=]+)=(.*)$ ]]; then
  key="${BASH_REMATCH[1]}"
  value="${BASH_REMATCH[2]}"
fi