How to Count Unique Lines in Bash | ActiveDirectoryTools

Quick Answer: Count Unique Lines in Bash

To count unique occurrences, use sort file.txt | uniq -c. This sorts the file, then counts each unique line. For quick unique count without sorting, use awk '!seen[$0]++ END {print length(seen)}' file.txt. The sort | uniq method is most reliable.

Quick Comparison: Unique Line Counting Methods

Method	Syntax	Preserves Order	Speed
sort \| uniq -c	`sort \| uniq -c`	No	Fast
awk !seen	`awk '!seen[$0]++'`	Yes	Medium
sort -u	`sort -u`	No	Fast

Bottom line: Use sort | uniq -c for counting with duplicates; use sort -u to just remove duplicates.

Count occurrences of unique lines in files. Learn different methods using uniq, sort, awk, and bash.

Method 1: Count Unique with uniq -c (Standard)

The most straightforward method for counting unique occurrences:

# Sort and count duplicates
sort file.txt | uniq -c

# Output:
#       3 apple
#       2 banana
#       1 orange

The -c flag counts occurrences of each line.

Detailed Example

Test file (fruits.txt):

apple
banana
apple
orange
banana
apple

# Sort and count
sort fruits.txt | uniq -c

# Output:
#       3 apple
#       2 banana
#       1 orange

Sort by Frequency

# Count and sort by frequency (highest first)
sort fruits.txt | uniq -c | sort -rn

# Count and sort by frequency (lowest first)
sort fruits.txt | uniq -c | sort -n

# Count and sort alphabetically
sort fruits.txt | uniq -c | sort -k2

Output (sorted by frequency):

      3 apple
      2 banana
      1 orange

Count Unique Lines

# Just count how many unique lines exist
sort file.txt | uniq | wc -l

# Example: 3 unique fruits in the file

Using awk for Counting

# Count occurrences using awk
awk '{count[$0]++} END {for (line in count) print count[line], line}' file.txt

# Output (order varies):
# 3 apple
# 2 banana
# 1 orange

Awk with Sorting

# Count with awk, then sort
awk '{count[$0]++} END {
  for (line in count)
    print count[line], line
}' file.txt | sort -rn

Case-Insensitive Counting

# Count ignoring case
awk '{count[tolower($0)]++} END {
  for (line in count)
    print count[line], line
}' file.txt | sort -rn

# Or with sort/uniq
tr '[:upper:]' '[:lower:]' < file.txt | sort | uniq -c | sort -rn

Practical Example: Log Analysis

#!/bin/bash

# File: analyze_errors.sh

logfile="$1"

if [ ! -f "$logfile" ]; then
  echo "Usage: $0 <logfile>"
  exit 1
fi

echo "=== Error Frequency Analysis ==="
echo ""

# Count unique error messages, sorted by frequency
grep "ERROR" "$logfile" | \
  cut -d: -f2- | \
  sort | uniq -c | \
  sort -rn | \
  head -10

echo ""
echo "=== Top 5 Error IPs ==="

# Count IP addresses
grep "ERROR" "$logfile" | \
  grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | \
  sort | uniq -c | sort -rn | head -5

Test log file (errors.log):

2026-02-21 ERROR:Database connection failed from 192.168.1.1
2026-02-21 ERROR:Timeout error from 192.168.1.2
2026-02-21 ERROR:Database connection failed from 192.168.1.1
2026-02-21 ERROR:Auth failed from 192.168.1.3
2026-02-21 ERROR:Database connection failed from 192.168.1.1

Output:

=== Error Frequency Analysis ===

      3 Database connection failed
      1 Timeout error
      1 Auth failed

=== Top 5 Error IPs ===

      3 192.168.1.1
      1 192.168.1.2
      1 192.168.1.3

Count Unique in Specific Column

#!/bin/bash

# Count unique values in a specific column (CSV)

file="$1"
column="${2:-1}"

if [ ! -f "$file" ]; then
  echo "Usage: $0 <file> [column]"
  exit 1
fi

echo "Unique values in column $column:"
echo ""

awk -F"," -v col="$column" '{print $col}' "$file" | \
  sort | uniq -c | sort -rn

Test file (users.csv):

John,USA
Jane,USA
Bob,UK
Alice,USA
Charlie,Canada

Usage:

$ ./count_unique.sh users.csv 2

Output:

Unique values in column 2:

      3 USA
      1 UK
      1 Canada

Count Total Unique

#!/bin/bash

file="$1"

# Total unique lines
total_unique=$(sort "$file" | uniq | wc -l)

# Total lines
total_lines=$(wc -l < "$file")

echo "Total lines: $total_lines"
echo "Unique lines: $total_unique"
echo "Duplicate lines: $((total_lines - total_unique))"

Most Common N Items

#!/bin/bash

file="$1"
count="${2:-10}"

# Show most frequent items
sort "$file" | uniq -c | sort -rn | head -"$count" | \
  awk '{printf "%s: %d\n", $2, $1}'

Exclude Duplicates

# Keep only unique lines (remove duplicates)
sort file.txt | uniq

# Keep only lines that appear exactly once
sort file.txt | uniq -u

# Keep only duplicated lines
sort file.txt | uniq -d

Count by Pattern

#!/bin/bash

# Count occurrences of patterns matching regex

pattern="$1"
file="$2"

if [ -z "$pattern" ] || [ ! -f "$file" ]; then
  echo "Usage: $0 <pattern> <file>"
  exit 1
fi

# Extract matching lines, get unique, count
grep "$pattern" "$file" | sort | uniq -c | sort -rn

Unique Count Report

#!/bin/bash

# Generate detailed unique count report

file="$1"

if [ ! -f "$file" ]; then
  echo "Usage: $0 <file>"
  exit 1
fi

echo "=== Unique Line Report ==="
echo ""

# Total stats
total=$(wc -l < "$file")
unique=$(sort "$file" | uniq | wc -l)

echo "Total lines: $total"
echo "Unique lines: $unique"
echo "Redundancy: $(printf "%.1f" $((100 - (unique * 100 / total))))%"
echo ""

# Top 10 most common
echo "Top 10 Most Frequent:"
sort "$file" | uniq -c | sort -rn | head -10 | \
  awk '{printf "%4d : %s\n", $1, $2}'
echo ""

# Items appearing only once
unique_only=$(sort "$file" | uniq -u | wc -l)
echo "Appearing only once: $unique_only"

Performance Comparison

Method	Speed	Memory	Best For
sort\|uniq	Fast	Medium	Sorted counting
awk	Medium	Low	In-memory counting
sort\|uniq\|sort	Medium	Medium	Frequency sorting

Common Mistakes

Forgetting to sort first - uniq requires sorted input
Case sensitivity - use tr or awk tolower for case-insensitive
Whitespace differences - may treat ” apple” and “apple” as different
Large files in awk - can run out of memory storing counts
Not understanding uniq -u - shows lines appearing only once, not unique

Key Options

Option	Purpose
`-c`	Count occurrences
`-d`	Show only duplicates
`-u`	Show only unique (appear once)
`-i`	Case-insensitive comparison

Key Points

Always sort before uniq
Use -c to count occurrences
Use sort -rn to sort by frequency
Use awk for more complex counting logic
Test with case variations

Summary

Counting unique lines is essential for log analysis and data quality. Use uniq for simple cases, awk for more control, and always sort your data first. Understanding the difference between “unique” (appears once) and “distinct” (different values) is crucial.