Skip to main content

How to Find Common Lines Between Files

• 1 min read
bash file comparison common lines comm intersection data processing

Quick Answer: Find Common Lines Between Files

To find lines appearing in both files, use comm -12 <(sort file1) <(sort file2). This sorts both files and shows only lines in both (column 12). For a simpler approach: grep -F -f file1 file2 searches file2 for lines matching file1.

Quick Comparison: Find Common Lines Methods

MethodSyntaxFiles NeededPerformance
comm -12comm -12 <(sort f1) <(sort f2)2 sortedFast
grep -F -fgrep -F -f file1 file22 anyMedium
awk arrayComplex awk script2 anyMedium

Bottom line: Use comm -12 for speed; use grep -F -f for simplicity.


Overview

Finding common lines between two files is a fundamental text processing task in Bash. Whether you’re comparing logs, analyzing differences, finding duplicates, or merging datasets, knowing how to identify matching lines is essential for system administration and data processing. This tutorial covers multiple methods to find lines that appear in both files efficiently.

The comm command is designed specifically for finding common lines. However, both files must be sorted first.

# Sort both files, then find common lines
comm -12 <(sort file1.txt) <(sort file2.txt)

# Explanation of comm options:
# -1: suppress lines unique to file1
# -2: suppress lines unique to file2
# -12: show only lines common to both files

Example with actual files:

# file1.txt
apple
banana
cherry
date

# file2.txt
banana
cherry
elderberry
fig

# Command:
comm -12 <(sort file1.txt) <(sort file2.txt)

# Output:
# banana
# cherry

Method 2: Using grep

The grep command can search for patterns from one file in another file.

# Find lines from file2 that exist in file1
grep -f file2.txt file1.txt

# With exact matching (useful for lines with special characters)
grep -xF -f file2.txt file1.txt

Example:

# file1.txt
apple
banana
cherry

# file2.txt
banana
cherry
dragonfruit

# Command:
grep -f file2.txt file1.txt

# Output:
# banana
# cherry

Method 3: Using awk

awk is powerful for comparing files and is often faster than other methods.

# Compare two files and show common lines
awk 'NR==FNR {a[$0]; next} $0 in a' file1.txt file2.txt

# Explanation:
# NR==FNR: Process first file
# a[$0]: Store all lines from file1 in array
# next: Go to next line
# $0 in a: Check if line from file2 exists in array

Example:

# Processing the same files as before
awk 'NR==FNR {a[$0]; next} $0 in a' file1.txt file2.txt

# Output:
# banana
# cherry

Method 4: Using sort and uniq

This method joins the files and uses sort/uniq to find duplicates.

# Combine files and find lines that appear exactly twice
sort file1.txt file2.txt | uniq -d

# -d flag shows only duplicate lines

Example:

sort file1.txt file2.txt | uniq -d

# Output:
# banana
# cherry

Method 5: Using diff

While diff is typically used to show differences, it can identify common lines.

# Show only lines that are the same in both files
diff file1.txt file2.txt | grep '^< ' | sed 's/^< //'

Practical Examples

Example 1: Find Users in Both Systems

#!/bin/bash
# Compare users from two systems

system1_users="system1_users.txt"
system2_users="system2_users.txt"

echo "Users in both systems:"
comm -12 <(sort "$system1_users") <(sort "$system2_users")

echo -e "\nUsers only in system1:"
comm -23 <(sort "$system1_users") <(sort "$system2_users")

echo -e "\nUsers only in system2:"
comm -13 <(sort "$system1_users") <(sort "$system2_users")

Example 2: Find Common Log Entries

#!/bin/bash
# Find IP addresses that appear in multiple log files

log1="/var/log/access1.log"
log2="/var/log/access2.log"

echo "IP addresses in both logs:"
grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' "$log1" | sort -u > /tmp/ips1.txt
grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' "$log2" | sort -u > /tmp/ips2.txt

comm -12 /tmp/ips1.txt /tmp/ips2.txt

# Cleanup
rm /tmp/ips1.txt /tmp/ips2.txt

Example 3: Find Common Configuration Lines

#!/bin/bash
# Find lines that appear in both configuration files

config1="config_old.txt"
config2="config_new.txt"

echo "Lines in both versions:"
awk 'NR==FNR {a[$0]; next} $0 in a' "$config1" "$config2"

echo -e "\nNew lines in latest config:"
awk 'NR==FNR {a[$0]; next} !($0 in a)' "$config1" "$config2"

Performance Comparison

For finding common lines between large files:

MethodSpeedMemoryNotes
commVery FastLowRequires sorted input
grep -fFastLowSimple, readable
awkVery FastMediumHighest performance
sort | uniq -dFastMediumWorks well for many files
diffSlowerLowNot ideal for this task

Best choice: Use awk 'NR==FNR {a[$0]; next} $0 in a' for best performance on large files.

Important Considerations

Case Sensitivity

By default, matching is case-sensitive. For case-insensitive matching:

# Using grep
grep -if <(tr '[:upper:]' '[:lower:]' < file2.txt) file1.txt

# Using awk
awk 'NR==FNR {a[tolower($0)]; next} tolower($0) in a' file1.txt file2.txt

Handling Empty Lines and Whitespace

Be aware of trailing whitespace or empty lines that might affect matching:

# Remove empty lines first
sed '/^$/d' file1.txt > /tmp/file1_clean.txt
sed '/^$/d' file2.txt > /tmp/file2_clean.txt

# Then find common lines
awk 'NR==FNR {a[$0]; next} $0 in a' /tmp/file1_clean.txt /tmp/file2_clean.txt

With Special Characters

When files contain special regex characters, use grep with -F flag:

# -F treats patterns as literal strings, not regex
grep -xF -f file2.txt file1.txt

Key Points

  • Use comm -12 for sorted files (fastest if already sorted)
  • Use awk 'NR==FNR {a[$0]; next} $0 in a' for maximum performance
  • Use grep -f for simple, readable code
  • Always consider sorting files first for consistent results
  • Watch out for whitespace and case sensitivity issues

Quick Reference

# Quick way to find common lines:
awk 'NR==FNR {a[$0]; next} $0 in a' file1.txt file2.txt

# With sorting (handles unsorted files):
comm -12 <(sort file1.txt) <(sort file2.txt)

# With grep:
grep -xF -f file2.txt file1.txt