patternDSA/Union Find
Core Pattern → Pattern 28

Union Find (DSU)

Answer "are these two nodes connected?" in near-O(1). Merge components dynamically as edges arrive. Detect cycles for free. Two arrays, two functions — one of the most practical data structures in competitive programming.

Frequency:★★★★☆
·~25 min·Intermediate·4 problems
The Core Shock

BFS/DFS connectivity check is O(V+E) per query. DSU answers the same question in O(α(n)) ≈ O(1) — and handles dynamic edge additions.

BFS/DFS connectivity
O(V+E) per query
Re-traverse the entire graph for each connectivity query. For n queries on a graph with E edges: O(n·(V+E)) total.
DSU (Union Find)
O(α(n)) per query
Two arrays, two functions. find() follows parent pointers to root. Path compression flattens the tree. Near-constant for any practical n.

α(n) is the inverse Ackermann function — it grows so slowly it's less than 5 for any n you'll ever encounter. The magic comes from two optimisations: path compression (flatten on find) and union by rank (merge shorter tree under taller).

DSU: parent[] + rank[]

Two arrays. Two functions. Everything else is application.

parent[i]

Stores the parent of node i. Initially parent[i]=i (each node is its own root). After unions, parent[i] points toward the component root. With path compression, most nodes point directly to the root.

parent = list(range(n))
# parent[i]=i → i is a root
# parent[i]=j → j is parent of i
rank[i]

Estimates the height of the subtree rooted at i. Used in union() to decide which root to attach under which. Only increases when merging two trees of equal rank. Prevents degenerate O(n) chains.

rank = [0] * n
# rank[i]=0 → leaf or single node
# rank grows slowly: ≤ log(n)
Path compression:
self.parent[x] = self.find(self.parent[x])
— flattens the entire path to root on every find()

Union by rank: attach root with lower rank under root with higher rank — prevents O(n) degenerate chains

Together: O(α(n)) ≈ O(1) amortised — far better than either optimisation alone

The 3-Step Framework

Build DSU → union edges → query with find() or count.

01
Initialise DSU for n nodes

parent = list(range(n)) sets every node as its own root. rank = [0]*n. count = n components. For 1-indexed problems (like Redundant Connection), use n+1. For grid problems (m×n), flatten to 1D: node = r*cols + c.

dsu = DSU(n)           # 0-indexed
dsu = DSU(n+1)         # 1-indexed nodes

# Grid flattening:
node = r * cols + c
dsu = DSU(rows * cols)
Always match the indexing scheme of the problem. If nodes are 1-indexed in the input, initialise DSU(n+1) and ignore index 0.
02
Union edges as they arrive

For each edge (u,v): call dsu.union(u,v). It returns True if merged (new connection), False if already connected (cycle for undirected graphs). For Kruskal-style problems: sort edges by weight first, then union in order.

for u, v in edges:
    dsu.union(u, v)

# Cycle detection:
for u, v in edges:
    if not dsu.union(u, v):
        return [u, v]  # redundant

# Kruskal (Path Min Effort):
edges.sort()
for w, u, v in edges:
    dsu.union(u, v)
    if dsu.connected(src, dst):
        return w
union() returning False is the cycle detector. In undirected graphs, an edge whose endpoints already share a root would create a cycle.
03
Query: connected() or count

connected(u,v) = find(u)==find(v). Use for: 'is there a path between u and v?', 'does adding this edge create a cycle?'. dsu.count gives the number of disjoint components at any point.

# Check connectivity:
if dsu.connected(src, dst):
    return answer

# Count components:
return dsu.count

# Or manually:
return len({dsu.find(i)
            for i in range(n)})
dsu.count decrements on each successful union(). Start at n (one component per node) and it accurately tracks the number of disjoint sets at all times.

Pattern Recognition

'connected components', 'cycle in undirected graph', 'union sets' → DSU.

Use DSU When
"Find redundant connection" (cycle in undirected graph)
"Number of provinces / islands / connected components"
"Is graph bipartite?" (BFS coloring OR DSU with offset)
Path With Min Effort — Kruskal-style minimum bottleneck
Dynamic connectivity: edges added over time
Kruskal's MST — union edges by ascending weight
"Accounts merged" — union by shared info
Grid problems: union adjacent cells to track regions
Common Pitfalls
Using DSU for directed graph cycle detection (use DFS instead)
Off-by-one: 1-indexed nodes but DSU(n) — should be DSU(n+1)
path compression: updating parent[x] but not recursing correctly
Union by rank: growing rank when ranks differ (only grow when equal)
Not calling find() before comparing — comparing raw parent values
Grid DSU: forgetting to flatten 2D index to 1D node id
Bipartite: using DSU incorrectly — BFS coloring is simpler here
Kruskal: not sorting edges before processing — wrong bottleneck