Skip to main content

How to Count Unique Lines in Bash

• 2 min read
bash uniq data processing counting deduplication

Quick Answer: Count Unique Lines in Bash

To count unique occurrences, use sort file.txt | uniq -c. This sorts the file, then counts each unique line. For quick unique count without sorting, use awk '!seen[$0]++ END {print length(seen)}' file.txt. The sort | uniq method is most reliable.

Quick Comparison: Unique Line Counting Methods

MethodSyntaxPreserves OrderSpeed
sort | uniq -csort | uniq -cNoFast
awk !seenawk '!seen[$0]++'YesMedium
sort -usort -uNoFast

Bottom line: Use sort | uniq -c for counting with duplicates; use sort -u to just remove duplicates.


Count occurrences of unique lines in files. Learn different methods using uniq, sort, awk, and bash.

Method 1: Count Unique with uniq -c (Standard)

The most straightforward method for counting unique occurrences:

# Sort and count duplicates
sort file.txt | uniq -c

# Output:
#       3 apple
#       2 banana
#       1 orange

The -c flag counts occurrences of each line.

Detailed Example

Test file (fruits.txt):

apple
banana
apple
orange
banana
apple
# Sort and count
sort fruits.txt | uniq -c

# Output:
#       3 apple
#       2 banana
#       1 orange

Sort by Frequency

# Count and sort by frequency (highest first)
sort fruits.txt | uniq -c | sort -rn

# Count and sort by frequency (lowest first)
sort fruits.txt | uniq -c | sort -n

# Count and sort alphabetically
sort fruits.txt | uniq -c | sort -k2

Output (sorted by frequency):

      3 apple
      2 banana
      1 orange

Count Unique Lines

# Just count how many unique lines exist
sort file.txt | uniq | wc -l

# Example: 3 unique fruits in the file

Using awk for Counting

# Count occurrences using awk
awk '{count[$0]++} END {for (line in count) print count[line], line}' file.txt

# Output (order varies):
# 3 apple
# 2 banana
# 1 orange

Awk with Sorting

# Count with awk, then sort
awk '{count[$0]++} END {
  for (line in count)
    print count[line], line
}' file.txt | sort -rn

Case-Insensitive Counting

# Count ignoring case
awk '{count[tolower($0)]++} END {
  for (line in count)
    print count[line], line
}' file.txt | sort -rn

# Or with sort/uniq
tr '[:upper:]' '[:lower:]' < file.txt | sort | uniq -c | sort -rn

Practical Example: Log Analysis

#!/bin/bash

# File: analyze_errors.sh

logfile="$1"

if [ ! -f "$logfile" ]; then
  echo "Usage: $0 <logfile>"
  exit 1
fi

echo "=== Error Frequency Analysis ==="
echo ""

# Count unique error messages, sorted by frequency
grep "ERROR" "$logfile" | \
  cut -d: -f2- | \
  sort | uniq -c | \
  sort -rn | \
  head -10

echo ""
echo "=== Top 5 Error IPs ==="

# Count IP addresses
grep "ERROR" "$logfile" | \
  grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | \
  sort | uniq -c | sort -rn | head -5

Test log file (errors.log):

2026-02-21 ERROR:Database connection failed from 192.168.1.1
2026-02-21 ERROR:Timeout error from 192.168.1.2
2026-02-21 ERROR:Database connection failed from 192.168.1.1
2026-02-21 ERROR:Auth failed from 192.168.1.3
2026-02-21 ERROR:Database connection failed from 192.168.1.1

Output:

=== Error Frequency Analysis ===

      3 Database connection failed
      1 Timeout error
      1 Auth failed

=== Top 5 Error IPs ===

      3 192.168.1.1
      1 192.168.1.2
      1 192.168.1.3

Count Unique in Specific Column

#!/bin/bash

# Count unique values in a specific column (CSV)

file="$1"
column="${2:-1}"

if [ ! -f "$file" ]; then
  echo "Usage: $0 <file> [column]"
  exit 1
fi

echo "Unique values in column $column:"
echo ""

awk -F"," -v col="$column" '{print $col}' "$file" | \
  sort | uniq -c | sort -rn

Test file (users.csv):

John,USA
Jane,USA
Bob,UK
Alice,USA
Charlie,Canada

Usage:

$ ./count_unique.sh users.csv 2

Output:

Unique values in column 2:

      3 USA
      1 UK
      1 Canada

Count Total Unique

#!/bin/bash

file="$1"

# Total unique lines
total_unique=$(sort "$file" | uniq | wc -l)

# Total lines
total_lines=$(wc -l < "$file")

echo "Total lines: $total_lines"
echo "Unique lines: $total_unique"
echo "Duplicate lines: $((total_lines - total_unique))"

Most Common N Items

#!/bin/bash

file="$1"
count="${2:-10}"

# Show most frequent items
sort "$file" | uniq -c | sort -rn | head -"$count" | \
  awk '{printf "%s: %d\n", $2, $1}'

Exclude Duplicates

# Keep only unique lines (remove duplicates)
sort file.txt | uniq

# Keep only lines that appear exactly once
sort file.txt | uniq -u

# Keep only duplicated lines
sort file.txt | uniq -d

Count by Pattern

#!/bin/bash

# Count occurrences of patterns matching regex

pattern="$1"
file="$2"

if [ -z "$pattern" ] || [ ! -f "$file" ]; then
  echo "Usage: $0 <pattern> <file>"
  exit 1
fi

# Extract matching lines, get unique, count
grep "$pattern" "$file" | sort | uniq -c | sort -rn

Unique Count Report

#!/bin/bash

# Generate detailed unique count report

file="$1"

if [ ! -f "$file" ]; then
  echo "Usage: $0 <file>"
  exit 1
fi

echo "=== Unique Line Report ==="
echo ""

# Total stats
total=$(wc -l < "$file")
unique=$(sort "$file" | uniq | wc -l)

echo "Total lines: $total"
echo "Unique lines: $unique"
echo "Redundancy: $(printf "%.1f" $((100 - (unique * 100 / total))))%"
echo ""

# Top 10 most common
echo "Top 10 Most Frequent:"
sort "$file" | uniq -c | sort -rn | head -10 | \
  awk '{printf "%4d : %s\n", $1, $2}'
echo ""

# Items appearing only once
unique_only=$(sort "$file" | uniq -u | wc -l)
echo "Appearing only once: $unique_only"

Performance Comparison

MethodSpeedMemoryBest For
sort|uniqFastMediumSorted counting
awkMediumLowIn-memory counting
sort|uniq|sortMediumMediumFrequency sorting

Common Mistakes

  1. Forgetting to sort first - uniq requires sorted input
  2. Case sensitivity - use tr or awk tolower for case-insensitive
  3. Whitespace differences - may treat ” apple” and “apple” as different
  4. Large files in awk - can run out of memory storing counts
  5. Not understanding uniq -u - shows lines appearing only once, not unique

Key Options

OptionPurpose
-cCount occurrences
-dShow only duplicates
-uShow only unique (appear once)
-iCase-insensitive comparison

Key Points

  • Always sort before uniq
  • Use -c to count occurrences
  • Use sort -rn to sort by frequency
  • Use awk for more complex counting logic
  • Test with case variations

Summary

Counting unique lines is essential for log analysis and data quality. Use uniq for simple cases, awk for more control, and always sort your data first. Understanding the difference between “unique” (appears once) and “distinct” (different values) is crucial.