patternDSA/Two Heaps
Core Pattern → Pattern 19

Two Heaps

Split data into two halves. Max-heap for the lower half, min-heap for the upper half. Two invariants. Median in O(1). Every "median of a stream" problem solved.

Frequency:★★★★☆
·~35 min·Intermediate–Hard·4 problems
The Core Shock

Re-sorting for each query = O(n² log n). Two heaps = O(log n) insert, O(1) median.

Sort on every insert
O(n log n) per query
Re-sort the entire array every time. O(n² log n) total for n insertions.
Two heaps
O(log n) · O(1)
Insert O(log n). Find median O(1). Two invariants guarantee correctness.

The insight: you don't need the full sorted order — just the middle. Keep all numbers smaller than the median in a max-heap and all numbers larger in a min-heap. The median is always at the tops of these two heaps.

The Two-Heap Structure

Visualize two heaps meeting in the middle.

max_heap — lower half
Contains all numbers ≤ median. Top = largest number in lower half = left boundary of median.
heapq with negation: heappush(h, -x)
min_heap — upper half
Contains all numbers ≥ median. Top = smallest number in upper half = right boundary of median.
standard heapq: heappush(h, x)
Two Invariants (must hold after every insert)
Invariant 1: Value invariant
max_heap.top ≤ min_heap.top
Every element in lower half ≤ every element in upper half
Invariant 2: Size invariant
|max_size - min_size| ≤ 1
Heaps stay within 1 of each other. max_heap can be 1 larger (holds the median for odd total).

The 3-Step Insert

Three steps on every insert — always in this order.

01
Push to max_heap (always start here)

Every new number goes to max_heap first, regardless of its value. This may temporarily violate the value invariant — that's expected and handled in step 2.

heapq.heappush(max_heap, -num)
Why max_heap first? Because max_heap represents the lower half and holds the median for odd-total cases. Starting here makes the size-balancing logic simpler.
02
Restore value invariant

If max_heap.top > min_heap.top, the largest 'small' number is actually larger than the smallest 'large' number — violation. Move max_heap.top to min_heap.

if min_heap and (-max_heap[0]) > min_heap[0]:
    val = -heapq.heappop(max_heap)
    heapq.heappush(min_heap, val)
This check is only needed if min_heap is non-empty. If min_heap is empty, there's no upper half to compare against — all numbers are in the lower half.
03
Restore size invariant

After value-balancing, check sizes. If max_heap has 2+ more elements than min_heap, or min_heap has more, rebalance by moving the top element between heaps.

if len(max_heap) > len(min_heap) + 1:
    val = -heapq.heappop(max_heap)
    heapq.heappush(min_heap, val)
elif len(min_heap) > len(max_heap):
    val = heapq.heappop(min_heap)
    heapq.heappush(max_heap, -val)
After these 3 steps: max_heap.top ≤ min_heap.top AND sizes differ by at most 1. Median = max_heap.top if sizes unequal, else average of both tops.

Pattern Recognition

Two heaps = any problem needing the 'middle' of a dynamic dataset.

Use Two Heaps When
"Median of a stream" or "running median"
Find median in a sliding window
Need O(1) access to both max of lower half AND min of upper half
"Maximize" with a capital/resource constraint (two criteria)
Scheduling: tasks split into two priority groups
Next interval (start times vs end times as two separate heaps)
Any problem where data splits naturally into two ordered halves
k-th largest from two sorted arrays (modified two-heap)
Common Pitfalls
Python heapq is min-heap — negate values for max-heap
Checking value invariant AFTER size balancing (wrong order)
Not initializing {0: 1} for median with empty heap check
Sliding window median: forgetting lazy deletion → stale tops
Sliding window: tracking balance after removal, not just after insertion
Maximize capital: using one heap instead of two → wrong projects selected
find_median when both heaps empty → unguarded index error
Forgetting max_heap can be 1 larger (it holds the median for odd n)