How to Capture Groups in Bash
Quick Answer: Capture Groups in Bash
To capture groups, use parentheses in regex: [[ "$string" =~ (group1)(group2) ]]. After matching, access groups via ${BASH_REMATCH[1]}, ${BASH_REMATCH[2]}, etc. The full match is in ${BASH_REMATCH[0]}.
Quick Comparison: Group Extraction Methods
| Method | Syntax | Best For | Complexity |
|---|---|---|---|
| Capture groups | (pattern) with BASH_REMATCH | Extracting parts | Simple |
| sed -n ‘s//&/p’ | sed 's/\(.*\)/\1/' | File processing | Moderate |
| awk substr | awk '{print substr($0, pos, len)}' | Field extraction | Moderate |
| grep -o | grep -o 'pattern' | Pattern only | Simple |
Bottom line: Use capture groups with BASH_REMATCH for maximum simplicity in Bash.
Learn how to extract matched pattern groups.
Overview
Regular expression capture groups allow you to extract specific portions of matched text. When working with complex text processing tasks, capture groups let you isolate and reuse parts of a pattern match. This is essential for parsing logs, extracting data from formatted strings, and validating input formats.
Basic Syntax
Capture groups use parentheses to define what you want to extract:
# Simple capture group
if [[ $string =~ (pattern) ]]; then
captured="${BASH_REMATCH[1]}"
fi
# Multiple capture groups
if [[ $string =~ (pattern1)(pattern2) ]]; then
first="${BASH_REMATCH[1]}"
second="${BASH_REMATCH[2]}"
fi
# BASH_REMATCH array:
# [0] = entire matched string
# [1] = first capture group
# [2] = second capture group, etc.
Example 1: Extract Email Address Parts
#!/bin/bash
email="john.doe@example.com"
# Capture username and domain
if [[ $email =~ ^([^@]+)@(.+)$ ]]; then
username="${BASH_REMATCH[1]}"
domain="${BASH_REMATCH[2]}"
echo "Email: $email"
echo "Username: $username"
echo "Domain: $domain"
fi
Output:
Email: john.doe@example.com
Username: john.doe
Domain: example.com
Example 2: Parse Date Format
#!/bin/bash
date_string="2024-12-25"
# Capture year, month, day
if [[ $date_string =~ ^([0-9]{4})-([0-9]{2})-([0-9]{2})$ ]]; then
year="${BASH_REMATCH[1]}"
month="${BASH_REMATCH[2]}"
day="${BASH_REMATCH[3]}"
echo "Date: $date_string"
echo "Year: $year"
echo "Month: $month"
echo "Day: $day"
fi
Output:
Date: 2024-12-25
Year: 2024
Month: 12
Day: 25
Example 3: Extract Log Information
#!/bin/bash
# Parse Apache-style log line
log_line='192.168.1.1 - - [25/Dec/2024:10:15:30 +0000] "GET /api/users HTTP/1.1" 200 1234'
# Capture IP, request method, path, status code
if [[ $log_line =~ ([0-9.]+).*\"([A-Z]+)\ ([^\ ]+).*\"\ ([0-9]+) ]]; then
ip="${BASH_REMATCH[1]}"
method="${BASH_REMATCH[2]}"
path="${BASH_REMATCH[3]}"
status="${BASH_REMATCH[4]}"
echo "IP Address: $ip"
echo "Method: $method"
echo "Path: $path"
echo "Status Code: $status"
fi
Output:
IP Address: 192.168.1.1
Method: GET
Path: /api/users
Status Code: 200
Example 4: Extract Name and Phone Number
#!/bin/bash
contact="John Smith: (555) 123-4567"
# Capture name and phone number
if [[ $contact =~ ([A-Za-z\ ]+):\ \(([0-9]+)\)\ ([0-9]+-[0-9]+) ]]; then
name="${BASH_REMATCH[1]}"
area_code="${BASH_REMATCH[2]}"
phone_number="${BASH_REMATCH[3]}"
echo "Name: $name"
echo "Area Code: $area_code"
echo "Phone: $phone_number"
fi
Output:
Name: John Smith
Area Code: 555
Phone: 123-4567
Example 5: Extract URL Components
#!/bin/bash
url="https://www.example.com:8080/api/users?id=123"
# Capture protocol, domain, port, path
if [[ $url =~ ^([a-z]+)://([^:/]+)(:([0-9]+))?(/[^?]*)(\?(.*))?$ ]]; then
protocol="${BASH_REMATCH[1]}"
domain="${BASH_REMATCH[2]}"
port="${BASH_REMATCH[4]}"
path="${BASH_REMATCH[5]}"
query="${BASH_REMATCH[7]}"
echo "Protocol: $protocol"
echo "Domain: $domain"
echo "Port: $port"
echo "Path: $path"
echo "Query: $query"
fi
Output:
Protocol: https
Domain: www.example.com
Port: 8080
Path: /api/users
Query: id=123
Example 6: Function to Extract Values
#!/bin/bash
# Reusable function for capture groups
extract_fields() {
local string="$1"
local pattern="$2"
if [[ $string =~ $pattern ]]; then
# Return all captured groups
for ((i=1; i<${#BASH_REMATCH[@]}; i++)); do
echo "${BASH_REMATCH[$i]}"
done
else
return 1
fi
}
# Usage example
email="alice@company.org"
extract_fields "$email" '^([^@]+)@(.+)$'
# Output:
# alice
# company.org
Practical Examples
Parse Configuration File
#!/bin/bash
config_file="settings.conf"
# Parse lines like: key = value
while IFS= read -r line; do
[[ $line =~ ^([^=]+)\ *=\ *(.*)$ ]] || continue
key="${BASH_REMATCH[1]}"
value="${BASH_REMATCH[2]}"
echo "Config: $key = $value"
done < "$config_file"
Validate and Extract IP Address
#!/bin/bash
ip="192.168.1.1"
if [[ $ip =~ ^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$ ]]; then
octet1="${BASH_REMATCH[1]}"
octet2="${BASH_REMATCH[2]}"
octet3="${BASH_REMATCH[3]}"
octet4="${BASH_REMATCH[4]}"
# Validate octets are 0-255
for octet in "$octet1" "$octet2" "$octet3" "$octet4"; do
if [ "$octet" -gt 255 ]; then
echo "Invalid IP"
exit 1
fi
done
echo "Valid IP: $ip"
else
echo "Invalid IP format"
exit 1
fi
Extract Version Numbers
#!/bin/bash
version_string="Application v2.3.4-beta.1"
if [[ $version_string =~ v([0-9]+)\.([0-9]+)\.([0-9]+)(-(.*))?$ ]]; then
major="${BASH_REMATCH[1]}"
minor="${BASH_REMATCH[2]}"
patch="${BASH_REMATCH[3]}"
suffix="${BASH_REMATCH[5]}"
echo "Major: $major"
echo "Minor: $minor"
echo "Patch: $patch"
[ -n "$suffix" ] && echo "Suffix: $suffix"
fi
Common Patterns
Email Address
if [[ $email =~ ^([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$ ]]; then
username="${BASH_REMATCH[1]}"
domain="${BASH_REMATCH[2]}"
fi
Phone Number (US Format)
if [[ $phone =~ ^\(?([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})$ ]]; then
area="${BASH_REMATCH[1]}"
prefix="${BASH_REMATCH[2]}"
line="${BASH_REMATCH[3]}"
fi
File Path and Extension
if [[ $file =~ ^(.*/)?([^/]+)\.([^.]+)$ ]]; then
directory="${BASH_REMATCH[1]}"
filename="${BASH_REMATCH[2]}"
extension="${BASH_REMATCH[3]}"
fi
Important Notes
BASH_REMATCH Array
- Index 0: The entire matched string
- Index 1+: Each capture group in order
- Only set when match succeeds with
=~operator
text="Hello123World"
if [[ $text =~ ([A-Za-z]+)([0-9]+)([A-Za-z]+) ]]; then
echo "Full match: ${BASH_REMATCH[0]}" # Hello123World
echo "Group 1: ${BASH_REMATCH[1]}" # Hello
echo "Group 2: ${BASH_REMATCH[2]}" # 123
echo "Group 3: ${BASH_REMATCH[3]}" # World
fi
Non-Capturing Groups
In some contexts, you might want (?:pattern) for non-capturing groups, but basic Bash regex doesn’t support them. Use capturing groups and ignore unwanted ones.
Performance Considerations
- Capture groups are fast in Bash 4+
- Use simple patterns for better performance
- Avoid excessive backtracking in regex patterns
- Consider using simpler tools (cut, awk) for simple string operations
Key Points
- Use
[[ string =~ pattern ]]for regex matching - Access captures via
${BASH_REMATCH[n]}array - Index 0 is the entire match, 1+ are capture groups
- Always check if match succeeded before accessing BASH_REMATCH
- Parentheses define capture groups
- Test regex patterns carefully for edge cases
Quick Reference
# Basic capture group
if [[ $string =~ (pattern) ]]; then
captured="${BASH_REMATCH[1]}"
fi
# Multiple groups
if [[ $string =~ (group1)(group2) ]]; then
first="${BASH_REMATCH[1]}"
second="${BASH_REMATCH[2]}"
fi
# Parse simple key=value
if [[ $line =~ ^([^=]+)=(.*)$ ]]; then
key="${BASH_REMATCH[1]}"
value="${BASH_REMATCH[2]}"
fi