Diff Algorithms
Diff tools find the differences between two texts. Here's how they work.
The Problem
Given two sequences, find the minimum edit operations (insert, delete) to transform one into the other.
Longest Common Subsequence (LCS)
Most diff algorithms are based on finding the LCS.
A: "ABCDEF"
B: "ABXDEF"
LCS: "ABDEF" (length 5)
Difference = chars not in LCS
A has: C (deleted)
B has: X (inserted)
Myers' Algorithm
The standard algorithm (used by Git). Finds the shortest edit script.
Time: O((N+M)×D) where D = edit distance
Space: O((N+M)×D) or O(N+M) with optimization
For similar files (small D): Nearly linear
For very different files: Up to O(N×M)
Patience Diff
Alternative algorithm that produces more readable diffs.
1. Find unique lines that appear once in both files
- Use these as "anchors"
- Recursively diff between anchors
Better for:
- Code refactoring
- Moved blocks of code
- Structural changes
Git Diff Options
# Default (Myers)
git diff
# Patience algorithm
git diff --patience
# Histogram (faster patience)
git diff --histogram