How to Count Unique Lines in Bash
Quick Answer: Count Unique Lines in Bash
To count unique occurrences, use sort file.txt | uniq -c. This sorts the file, then counts each unique line. For quick unique count without sorting, use awk '!seen[$0]++ END {print length(seen)}' file.txt. The sort | uniq method is most reliable.
Quick Comparison: Unique Line Counting Methods
| Method | Syntax | Preserves Order | Speed |
|---|---|---|---|
| sort | uniq -c | sort | uniq -c | No | Fast |
| awk !seen | awk '!seen[$0]++' | Yes | Medium |
| sort -u | sort -u | No | Fast |
Bottom line: Use sort | uniq -c for counting with duplicates; use sort -u to just remove duplicates.
Count occurrences of unique lines in files. Learn different methods using uniq, sort, awk, and bash.
Method 1: Count Unique with uniq -c (Standard)
The most straightforward method for counting unique occurrences:
# Sort and count duplicates
sort file.txt | uniq -c
# Output:
# 3 apple
# 2 banana
# 1 orange
The -c flag counts occurrences of each line.
Detailed Example
Test file (fruits.txt):
apple
banana
apple
orange
banana
apple
# Sort and count
sort fruits.txt | uniq -c
# Output:
# 3 apple
# 2 banana
# 1 orange
Sort by Frequency
# Count and sort by frequency (highest first)
sort fruits.txt | uniq -c | sort -rn
# Count and sort by frequency (lowest first)
sort fruits.txt | uniq -c | sort -n
# Count and sort alphabetically
sort fruits.txt | uniq -c | sort -k2
Output (sorted by frequency):
3 apple
2 banana
1 orange
Count Unique Lines
# Just count how many unique lines exist
sort file.txt | uniq | wc -l
# Example: 3 unique fruits in the file
Using awk for Counting
# Count occurrences using awk
awk '{count[$0]++} END {for (line in count) print count[line], line}' file.txt
# Output (order varies):
# 3 apple
# 2 banana
# 1 orange
Awk with Sorting
# Count with awk, then sort
awk '{count[$0]++} END {
for (line in count)
print count[line], line
}' file.txt | sort -rn
Case-Insensitive Counting
# Count ignoring case
awk '{count[tolower($0)]++} END {
for (line in count)
print count[line], line
}' file.txt | sort -rn
# Or with sort/uniq
tr '[:upper:]' '[:lower:]' < file.txt | sort | uniq -c | sort -rn
Practical Example: Log Analysis
#!/bin/bash
# File: analyze_errors.sh
logfile="$1"
if [ ! -f "$logfile" ]; then
echo "Usage: $0 <logfile>"
exit 1
fi
echo "=== Error Frequency Analysis ==="
echo ""
# Count unique error messages, sorted by frequency
grep "ERROR" "$logfile" | \
cut -d: -f2- | \
sort | uniq -c | \
sort -rn | \
head -10
echo ""
echo "=== Top 5 Error IPs ==="
# Count IP addresses
grep "ERROR" "$logfile" | \
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | \
sort | uniq -c | sort -rn | head -5
Test log file (errors.log):
2026-02-21 ERROR:Database connection failed from 192.168.1.1
2026-02-21 ERROR:Timeout error from 192.168.1.2
2026-02-21 ERROR:Database connection failed from 192.168.1.1
2026-02-21 ERROR:Auth failed from 192.168.1.3
2026-02-21 ERROR:Database connection failed from 192.168.1.1
Output:
=== Error Frequency Analysis ===
3 Database connection failed
1 Timeout error
1 Auth failed
=== Top 5 Error IPs ===
3 192.168.1.1
1 192.168.1.2
1 192.168.1.3
Count Unique in Specific Column
#!/bin/bash
# Count unique values in a specific column (CSV)
file="$1"
column="${2:-1}"
if [ ! -f "$file" ]; then
echo "Usage: $0 <file> [column]"
exit 1
fi
echo "Unique values in column $column:"
echo ""
awk -F"," -v col="$column" '{print $col}' "$file" | \
sort | uniq -c | sort -rn
Test file (users.csv):
John,USA
Jane,USA
Bob,UK
Alice,USA
Charlie,Canada
Usage:
$ ./count_unique.sh users.csv 2
Output:
Unique values in column 2:
3 USA
1 UK
1 Canada
Count Total Unique
#!/bin/bash
file="$1"
# Total unique lines
total_unique=$(sort "$file" | uniq | wc -l)
# Total lines
total_lines=$(wc -l < "$file")
echo "Total lines: $total_lines"
echo "Unique lines: $total_unique"
echo "Duplicate lines: $((total_lines - total_unique))"
Most Common N Items
#!/bin/bash
file="$1"
count="${2:-10}"
# Show most frequent items
sort "$file" | uniq -c | sort -rn | head -"$count" | \
awk '{printf "%s: %d\n", $2, $1}'
Exclude Duplicates
# Keep only unique lines (remove duplicates)
sort file.txt | uniq
# Keep only lines that appear exactly once
sort file.txt | uniq -u
# Keep only duplicated lines
sort file.txt | uniq -d
Count by Pattern
#!/bin/bash
# Count occurrences of patterns matching regex
pattern="$1"
file="$2"
if [ -z "$pattern" ] || [ ! -f "$file" ]; then
echo "Usage: $0 <pattern> <file>"
exit 1
fi
# Extract matching lines, get unique, count
grep "$pattern" "$file" | sort | uniq -c | sort -rn
Unique Count Report
#!/bin/bash
# Generate detailed unique count report
file="$1"
if [ ! -f "$file" ]; then
echo "Usage: $0 <file>"
exit 1
fi
echo "=== Unique Line Report ==="
echo ""
# Total stats
total=$(wc -l < "$file")
unique=$(sort "$file" | uniq | wc -l)
echo "Total lines: $total"
echo "Unique lines: $unique"
echo "Redundancy: $(printf "%.1f" $((100 - (unique * 100 / total))))%"
echo ""
# Top 10 most common
echo "Top 10 Most Frequent:"
sort "$file" | uniq -c | sort -rn | head -10 | \
awk '{printf "%4d : %s\n", $1, $2}'
echo ""
# Items appearing only once
unique_only=$(sort "$file" | uniq -u | wc -l)
echo "Appearing only once: $unique_only"
Performance Comparison
| Method | Speed | Memory | Best For |
|---|---|---|---|
| sort|uniq | Fast | Medium | Sorted counting |
| awk | Medium | Low | In-memory counting |
| sort|uniq|sort | Medium | Medium | Frequency sorting |
Common Mistakes
- Forgetting to sort first - uniq requires sorted input
- Case sensitivity - use
trorawktolower for case-insensitive - Whitespace differences - may treat ” apple” and “apple” as different
- Large files in awk - can run out of memory storing counts
- Not understanding uniq -u - shows lines appearing only once, not unique
Key Options
| Option | Purpose |
|---|---|
-c | Count occurrences |
-d | Show only duplicates |
-u | Show only unique (appear once) |
-i | Case-insensitive comparison |
Key Points
- Always sort before uniq
- Use
-cto count occurrences - Use
sort -rnto sort by frequency - Use awk for more complex counting logic
- Test with case variations
Summary
Counting unique lines is essential for log analysis and data quality. Use uniq for simple cases, awk for more control, and always sort your data first. Understanding the difference between “unique” (appears once) and “distinct” (different values) is crucial.