Bash Regular Expressions - Pattern Matching and Validation
Quick Answer: How Do You Use Regular Expressions in Bash?
Use the =~ operator with double brackets to test if a string matches a regex pattern: [[ $text =~ pattern ]]. Common patterns: [0-9]+ (numbers), [a-z]+ (letters), ^ (start), $ (end), .* (any characters).
Quick Comparison: Regex Methods
| Method | Use Case | Power | Complexity |
|---|---|---|---|
Glob patterns [[ $var == pattern ]] | Simple matching, no regex | Limited | Very simple |
Regex =~ [[ $var =~ regex ]] | Complex patterns, validation | High | Moderate |
| grep patterns | Finding in files | High | Moderate |
| sed patterns | Text replacement | High | Moderate |
Bottom line: Use =~ for pattern matching in Bash, grep/sed for file operations.
Regular expressions enable powerful pattern matching and validation. This guide covers Bash regex syntax, matching operators, and practical validation examples you’ll use for input validation and data extraction.
Table of Contents
- Regex Basics
- Character Classes
- Quantifiers
- Anchors
- Matching Operators
- Practical Examples
- Validation Patterns
- Best Practices
- Frequently Asked Questions
Regex Basics
The =~ operator tests if a string matches a regex pattern. Double brackets [[ are required for this to work.
Simple Pattern
The simplest regex is just a literal string. This matches if the string is found anywhere in the text.
text="hello world"
if [[ $text =~ world ]]; then
echo "Match found"
fi
This outputs “Match found” because “world” is in the text. For pattern matching, you use special characters (like ., *, +) which we’ll cover next.
Literal Characters
Some characters have special meaning in regex. To match them literally, escape them with a backslash.
# Match exact characters
text="file.txt"
[[ $text =~ \.txt ]] # Match .txt (escaped dot)
The \. matches a literal dot. Without the backslash, . would match any character, not just a dot. This is crucial for matching filenames and paths.
Character Classes
Character classes let you match groups of characters without listing each one individually. This is essential for patterns like “any digit” or “any lowercase letter”.
Common Classes
| Class | Matches | Example |
|---|---|---|
. | Any character | a.c matches “abc”, “adc”, “a c” |
[abc] | a, b, or c | [abc] matches single ‘a’, ‘b’, or ‘c’ |
[^abc] | NOT a, b, or c | Matches anything except those letters |
[a-z] | a through z | Matches any lowercase letter |
[0-9] | Any digit | Matches single digit 0-9 |
\w | Word character (letters, digits, _) | Matches alphanumeric or underscore |
\s | Whitespace | Matches space, tab, newline |
Examples
These patterns match specific types of content:
# Match digit
[[ "abc123" =~ [0-9] ]] && echo "Contains digit"
# Match word characters (letters, numbers, underscore)
[[ "hello_world" =~ ^[a-z_]+$ ]] && echo "Valid identifier"
# Match uppercase letter
[[ "Hello" =~ [A-Z] ]] && echo "Contains uppercase"
Quantifiers
Quantifiers control how many times a pattern repeats. They’re what make regex powerful—without them, you’d have to match each occurrence individually.
| Quantifier | Meaning | Example |
|---|---|---|
* | 0 or more | a*b matches “b”, “ab”, “aab”, etc. |
+ | 1 or more | a+b matches “ab”, “aab”, but not “b” |
? | 0 or 1 | a?b matches “b” or “ab” (optional) |
{n} | Exactly n | a{3} matches exactly “aaa” |
{n,} | n or more | a{3,} matches “aaa”, “aaaa”, etc. |
{n,m} | Between n and m | a{2,4} matches “aa”, “aaa”, or “aaaa” |
Examples
These patterns match repeated sequences:
# One or more digits (entire string must be digits)
[[ "123" =~ ^[0-9]+$ ]] && echo "All digits"
# Optional hyphen (useful for phone number flexibility)
[[ "555-1234" =~ ^[0-9]{3}-?[0-9]{4}$ ]] && echo "Phone format"
# Zero or more spaces (flexible spacing)
[[ "hello world" =~ hello\ *world ]] && echo "Match with spaces"
Anchors
Anchors position patterns at specific locations—the start, end, or boundaries of strings. They don’t match characters themselves; they match positions.
| Anchor | Meaning | Example |
|---|---|---|
^ | Start of string | ^hello matches “hello” only at the beginning |
$ | End of string | world$ matches “world” only at the end |
\b | Word boundary | \bword\b matches “word” as a complete word |
Examples
These patterns match at specific locations:
# Match at start (ensure string begins with expected text)
[[ "hello world" =~ ^hello ]] && echo "Starts with hello"
# Match at end (check file extensions, endings)
[[ "hello world" =~ world$ ]] && echo "Ends with world"
# Exact match
[[ "hello" =~ ^hello$ ]] && echo "Exact match"
When to Use Anchors
Use anchors when:
- Validating complete input (email, username)
- Checking file extensions
- Ensuring patterns are at boundaries
- Avoiding partial matches
Matching Operators
Basic Matching ([[ =~ ]])
The =~ operator tests if a string matches a regex. It returns 0 (success/true) if it matches, 1 (failure/false) if it doesn’t.
text="hello123"
if [[ $text =~ [0-9]+ ]]; then
echo "Contains numbers"
fi
This checks if the text contains any numbers. The [0-9]+ pattern means “one or more digits.”
Capture Groups
Extract parts of matched text using parentheses. The captured groups are stored in the BASH_REMATCH array.
text="Name: John"
if [[ $text =~ Name:\ ([a-z]+) ]]; then
echo "Name is ${BASH_REMATCH[1]}" # Output: John (captured name)
fi
The parentheses create a capture group. BASH_REMATCH[0] is the entire match, BASH_REMATCH[1] is the first group, etc. This is how you extract data from strings.
Case-Insensitive Matching (Bash 4+)
For case-insensitive matching, use the nocasematch option:
shopt -s nocasematch # Enable case-insensitive
if [[ "HELLO" =~ hello ]]; then
echo "Case-insensitive match"
fi
shopt -u nocasematch # Disable case-insensitive
This is useful for user input validation where you don’t want to care about the user’s capitalization.
When to Use Regex Matching
Use =~ with regex when:
- Validating user input (email, phone, URL)
- Pattern matching complex strings
- Extracting data with capture groups
- You need more power than glob patterns
Use glob patterns [[ == *pattern* ]] when:
- Simple substring checking
- Speed is critical (glob is slightly faster)
- The pattern doesn’t need regex features
Quick Reference
# Basic matching
[[ $text =~ pattern ]] # Returns true if matches
# Character classes
[[ $text =~ [0-9] ]] # Contains digit
[[ $text =~ [a-z]+ ]] # Contains lowercase letters
[[ $text =~ [A-Z] ]] # Contains uppercase letter
# Quantifiers
[[ $text =~ ^[0-9]+$ ]] # Entire string is digits
[[ $text =~ [0-9]{3}-[0-9]{4} ]] # Specific pattern (like 123-4567)
# Anchors
[[ $text =~ ^hello ]] # Starts with hello
[[ $text =~ world$ ]] # Ends with world
[[ $text =~ ^exact$ ]] # Exact match
# Capture groups
if [[ $text =~ ([0-9]+)-([a-z]+) ]]; then
echo "${BASH_REMATCH[1]}" # First captured group
echo "${BASH_REMATCH[2]}" # Second captured group
fi
# Case-insensitive
shopt -s nocasematch
[[ $text =~ pattern ]] # Now case-insensitive
shopt -u nocasematch
Practical Examples
Email Validation
validate_email() {
local email="$1"
if [[ $email =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ ]]; then
echo "Valid email"
else
echo "Invalid email"
fi
}
validate_email "john@example.com"
URL Validation
validate_url() {
local url="$1"
if [[ $url =~ ^https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} ]]; then
echo "Valid URL"
else
echo "Invalid URL"
fi
}
validate_url "https://example.com"
Phone Number
validate_phone() {
local phone="$1"
if [[ $phone =~ ^[0-9]{3}-[0-9]{3}-[0-9]{4}$ ]]; then
echo "Valid phone"
else
echo "Invalid phone"
fi
}
validate_phone "555-123-4567"
Validation Patterns
Username (alphanumeric and underscore)
[[ $username =~ ^[a-zA-Z0-9_]{3,20}$ ]]
IP Address (simplified)
[[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]
Date (YYYY-MM-DD)
[[ $date =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]
Hex Color (#RRGGBB)
[[ $color =~ ^#[0-9a-fA-F]{6}$ ]]
Best Practices
1. Quote Regex Carefully
# Good
pattern="^[0-9]+$"
[[ $text =~ $pattern ]]
# Be careful with spaces in pattern
pattern="hello world"
[[ "hello world" =~ $pattern ]]
2. Use Proper Anchors
# Good (exact match)
[[ $text =~ ^pattern$ ]]
# Less strict (pattern anywhere)
[[ $text =~ pattern ]]
3. Test Patterns
# Test various inputs
for input in "valid" "VALID" "invalid123"; do
if [[ $input =~ ^[a-z]+$ ]]; then
echo "$input matches"
fi
done
4. Document Complex Patterns
# Email validation regex
# - Local part: letters, numbers, dots, hyphens
# - @ symbol
# - Domain: letters, numbers, hyphens, dots
# - TLD: 2+ letters
email_regex="^[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
Frequently Asked Questions
Q: What’s the difference between grep and [[ =~ ]]?
A: grep searches lines in files. [[ =~ ]] does regex matching in variables.
Q: Are Bash regexes PCRE?
A: No, Bash uses ERE (Extended Regular Expressions). Some PCRE features not available.
Q: How do I debug regex patterns?
A: Test incrementally: start simple, add complexity gradually.
Q: Can I use case-insensitive matching?
A: Yes: shopt -s nocasematch before matching.
Next Steps
Explore related topics:
- Bash Text Processing - grep, sed, awk in context
- Bash String Manipulation - Pattern-based string operations