How to Find Common Lines Between Files

Quick Answer: Find Common Lines Between Files

To find lines appearing in both files, use comm -12 <(sort file1) <(sort file2). This sorts both files and shows only lines in both (column 12). For a simpler approach: grep -F -f file1 file2 searches file2 for lines matching file1.

Quick Comparison: Find Common Lines Methods

Method	Syntax	Files Needed	Performance
comm -12	`comm -12 <(sort f1) <(sort f2)`	2 sorted	Fast
grep -F -f	`grep -F -f file1 file2`	2 any	Medium
awk array	Complex awk script	2 any	Medium

Bottom line: Use comm -12 for speed; use grep -F -f for simplicity.

Overview

Finding common lines between two files is a fundamental text processing task in Bash. Whether you’re comparing logs, analyzing differences, finding duplicates, or merging datasets, knowing how to identify matching lines is essential for system administration and data processing. This tutorial covers multiple methods to find lines that appear in both files efficiently.

Method 1: Using comm Command (Recommended)

The comm command is designed specifically for finding common lines. However, both files must be sorted first.

# Sort both files, then find common lines
comm -12 <(sort file1.txt) <(sort file2.txt)

# Explanation of comm options:
# -1: suppress lines unique to file1
# -2: suppress lines unique to file2
# -12: show only lines common to both files

Example with actual files:

# file1.txt
apple
banana
cherry
date

# file2.txt
banana
cherry
elderberry
fig

# Command:
comm -12 <(sort file1.txt) <(sort file2.txt)

# Output:
# banana
# cherry

Method 2: Using grep

The grep command can search for patterns from one file in another file.

# Find lines from file2 that exist in file1
grep -f file2.txt file1.txt

# With exact matching (useful for lines with special characters)
grep -xF -f file2.txt file1.txt

Example:

# file1.txt
apple
banana
cherry

# file2.txt
banana
cherry
dragonfruit

# Command:
grep -f file2.txt file1.txt

# Output:
# banana
# cherry

Method 3: Using awk

awk is powerful for comparing files and is often faster than other methods.

# Compare two files and show common lines
awk 'NR==FNR {a[$0]; next} $0 in a' file1.txt file2.txt

# Explanation:
# NR==FNR: Process first file
# a[$0]: Store all lines from file1 in array
# next: Go to next line
# $0 in a: Check if line from file2 exists in array

Example:

# Processing the same files as before
awk 'NR==FNR {a[$0]; next} $0 in a' file1.txt file2.txt

# Output:
# banana
# cherry

Method 4: Using sort and uniq

This method joins the files and uses sort/uniq to find duplicates.

# Combine files and find lines that appear exactly twice
sort file1.txt file2.txt | uniq -d

# -d flag shows only duplicate lines

Example:

sort file1.txt file2.txt | uniq -d

# Output:
# banana
# cherry

Method 5: Using diff

While diff is typically used to show differences, it can identify common lines.

# Show only lines that are the same in both files
diff file1.txt file2.txt | grep '^< ' | sed 's/^< //'

Practical Examples

Example 1: Find Users in Both Systems

#!/bin/bash
# Compare users from two systems

system1_users="system1_users.txt"
system2_users="system2_users.txt"

echo "Users in both systems:"
comm -12 <(sort "$system1_users") <(sort "$system2_users")

echo -e "\nUsers only in system1:"
comm -23 <(sort "$system1_users") <(sort "$system2_users")

echo -e "\nUsers only in system2:"
comm -13 <(sort "$system1_users") <(sort "$system2_users")

Example 2: Find Common Log Entries

#!/bin/bash
# Find IP addresses that appear in multiple log files

log1="/var/log/access1.log"
log2="/var/log/access2.log"

echo "IP addresses in both logs:"
grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' "$log1" | sort -u > /tmp/ips1.txt
grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' "$log2" | sort -u > /tmp/ips2.txt

comm -12 /tmp/ips1.txt /tmp/ips2.txt

# Cleanup
rm /tmp/ips1.txt /tmp/ips2.txt

Example 3: Find Common Configuration Lines

#!/bin/bash
# Find lines that appear in both configuration files

config1="config_old.txt"
config2="config_new.txt"

echo "Lines in both versions:"
awk 'NR==FNR {a[$0]; next} $0 in a' "$config1" "$config2"

echo -e "\nNew lines in latest config:"
awk 'NR==FNR {a[$0]; next} !($0 in a)' "$config1" "$config2"

Performance Comparison

For finding common lines between large files:

Method	Speed	Memory	Notes
comm	Very Fast	Low	Requires sorted input
grep -f	Fast	Low	Simple, readable
awk	Very Fast	Medium	Highest performance
sort \| uniq -d	Fast	Medium	Works well for many files
diff	Slower	Low	Not ideal for this task

Best choice: Use awk 'NR==FNR {a[$0]; next} $0 in a' for best performance on large files.

Important Considerations

Case Sensitivity

By default, matching is case-sensitive. For case-insensitive matching:

# Using grep
grep -if <(tr '[:upper:]' '[:lower:]' < file2.txt) file1.txt

# Using awk
awk 'NR==FNR {a[tolower($0)]; next} tolower($0) in a' file1.txt file2.txt

Handling Empty Lines and Whitespace

Be aware of trailing whitespace or empty lines that might affect matching:

# Remove empty lines first
sed '/^$/d' file1.txt > /tmp/file1_clean.txt
sed '/^$/d' file2.txt > /tmp/file2_clean.txt

# Then find common lines
awk 'NR==FNR {a[$0]; next} $0 in a' /tmp/file1_clean.txt /tmp/file2_clean.txt

With Special Characters

When files contain special regex characters, use grep with -F flag:

# -F treats patterns as literal strings, not regex
grep -xF -f file2.txt file1.txt

Key Points

Use comm -12 for sorted files (fastest if already sorted)
Use awk 'NR==FNR {a[$0]; next} $0 in a' for maximum performance
Use grep -f for simple, readable code
Always consider sorting files first for consistent results
Watch out for whitespace and case sensitivity issues

Quick Reference

# Quick way to find common lines:
awk 'NR==FNR {a[$0]; next} $0 in a' file1.txt file2.txt

# With sorting (handles unsorted files):
comm -12 <(sort file1.txt) <(sort file2.txt)

# With grep:
grep -xF -f file2.txt file1.txt

Quick Answer: Find Common Lines Between Files

Quick Comparison: Find Common Lines Methods

Overview

Method 1: Using comm Command (Recommended)

Method 2: Using grep

Method 3: Using awk

Method 4: Using sort and uniq

Method 5: Using diff

Practical Examples

Example 1: Find Users in Both Systems

Example 2: Find Common Log Entries

Example 3: Find Common Configuration Lines

Performance Comparison

Important Considerations

Case Sensitivity

Handling Empty Lines and Whitespace

With Special Characters

Key Points

Quick Reference

Related Articles

How to Filter with Awk

How to Calculate Average in Bash

How to Count Occurrences in Bash