How Diff Algorithms Work: Myers, Patience & More

Understanding the algorithms behind text comparison tools like Git diff.

algorithmstechnicalgit

Diff Algorithms

Diff tools find the differences between two texts. Here's how they work.

The Problem

Given two sequences, find the minimum edit operations (insert, delete) to transform one into the other.

Longest Common Subsequence (LCS)

Most diff algorithms are based on finding the LCS.

A: "ABCDEF"
B: "ABXDEF"
LCS: "ABDEF" (length 5)

Difference = chars not in LCS
A has: C (deleted)
B has: X (inserted)

Myers' Algorithm

The standard algorithm (used by Git). Finds the shortest edit script.

Time: O((N+M)×D) where D = edit distance
Space: O((N+M)×D) or O(N+M) with optimization

For similar files (small D): Nearly linear
For very different files: Up to O(N×M)

Patience Diff

Alternative algorithm that produces more readable diffs.

1. Find unique lines that appear once in both files
  • Use these as "anchors"
  • Recursively diff between anchors

Better for:
  • Code refactoring
  • Moved blocks of code
  • Structural changes

Git Diff Options

# Default (Myers)
git diff
# Patience algorithm
git diff --patience
# Histogram (faster patience)
git diff --histogram

Frequently Asked Questions

Common questions about this topic

Git's default Myers algorithm optimizes for minimal edits, not human readability. When code is moved or refactored, it may match wrong lines. Try 'git diff --patience' or '--histogram' for clearer results on structural changes. Context matters for good diffs.

Myers algorithm: O((N+M)×D) where N,M are file lengths and D is edit distance. For similar files (small D), it's nearly linear. For completely different files, it approaches O(N×M). Patience diff has similar complexity but better average cases for code.

Text diff shows binary files as 'binary files differ'. For meaningful binary diffs: use specialized tools (image diff, hex diff), convert to text representation first, or use binary diff tools that output patches. Git can diff some binary formats with proper configuration.