How to Find Duplicate Files
• 1 min read
bash
Quick Answer: Find Duplicate Files
To find duplicate files in Bash, use checksums to identify identical content: md5sum * | sort | uniq -d -w 32. For faster preliminary filtering, check file sizes first: find . -type f -printf '%s %p\n' | sort -n.
Quick Comparison: Duplicate File Detection Methods
| Method | Speed | Accuracy | Best For |
|---|---|---|---|
| Size check | Very fast | False positives | Preliminary |
| md5sum | Fast | Perfect | Exact matches |
| sha256sum | Slower | Perfect | Security check |
| cmp | Medium | Perfect | File-by-file |
| fdupes tool | Fast | Perfect | Bulk finding |
Bottom line: Use size check first, then md5sum to confirm duplicates.
Find duplicate files using checksums, size, or file comparison.
Find by Size First
# Files with same size (potential duplicates)
find . -type f -printf '%s %p\n' | sort -n | uniq -d -w 20
Using md5sum
# Generate checksums
find . -type f -exec md5sum {} \; | sort
# Find duplicates (same checksum)
find . -type f -exec md5sum {} \; | sort | uniq -d -w 32
Practical Example: Find Duplicates
#!/bin/bash
# File: find_duplicates.sh
directory="${1:-.}"
echo "=== Finding Duplicate Files ==="
echo ""
# Create temp file with checksums
tmpfile=$(mktemp)
trap "rm -f $tmpfile" EXIT
find "$directory" -type f -exec md5sum {} \; | \
sort > "$tmpfile"
# Find duplicates (same checksum appears 2+ times)
awk '
{
if (seen[$1]++) {
if (!printed[$1]++) {
print "Duplicate checksum:", $1
print " " seen[$1] "-", prev[$1]
}
print " " (seen[$1]), $2
}
prev[$1] = $2
}
' "$tmpfile"
Find by Filename
# Files with same name
find . -type f -name "*.txt" | sort | uniq -d
Compare Files Directly
#!/bin/bash
# Find identical files
find . -type f -print0 | xargs -0 md5sum | \
sort | \
awk '{print $1}' | \
uniq -d -c | \
while read count hash; do
echo "Found $count identical files (hash: $hash)"
fi
Remove Duplicates
#!/bin/bash
# Keep first, remove duplicates
declare -A seen
find . -type f | while read file; do
hash=$(md5sum "$file" | cut -d' ' -f1)
if [ -n "${seen[$hash]}" ]; then
echo "Removing duplicate: $file"
rm "$file"
else
seen[$hash]=1
fi
done
Summary
Use md5sum for reliable duplicate detection. Always verify before deleting.