The Heap That Wouldn't Sit Still
Top gainers and losers, live, over 20,000 symbols and the tax I didn't price in

A while back at work I picked up a ticket that sounded almost insultingly simple when I first read it: "show the top 25 gainers and top 25 losers today". Live, off the tick firehose, for the whole market, no nightly batch job. I figured an afternoon.
This was the first real streaming system I shipped, and it's stuck with me less for the algorithm than for a cost I underweighted going in. The data structure I picked, a heap, the textbook answer for "top K," was a perfectly defensible choice. What I hadn't reckoned with was what it costs to keep a heap correct when the values underneath it never stop moving and the thing has to stay live on every tick. That bill came due in load testing.
So here's the story, the regime that made it interesting, and the structure I'd reach for now.
The problem
A live feed of (symbol, price) ticks. Maintain, in real time, the top 25 gainers and top 25 losers for the day, where a stock's gain is its move off the previous close:
gain = (last_price - prev_close) / prev_close
prev_close is reference data, fixed for the session and refreshed each morning. Two boards, both ends of the same ranking, both current.
The numbers are what give this its shape: ~20,000 symbols, each ticking once a second, so roughly 20k updates per second, and the boards had to reflect every tick. That last requirement matters. This wasn't a screen someone refreshed on a timer; it was a live ranking that had to move the instant a price did.
Why the textbook answer doesn't quite fit
Going in, the central fact was obvious and I designed around it: a stock's gain changes on every tick. That's the whole difference from the tidy version of top-K. "Top 25 numbers in a stream" assumes each element arrives once with a final value, so a size-25 heap is enough: keep the best 25, evict the root when something better shows up. The instant values mutate, that structure is wrong, and not subtly. A stock that's in your top 25 on a +9% morning print is still sitting there at +9% after it fades to +1%, because a binary heap only re-examines an element when it happens to bubble toward the root. It has no notion of "this element's key just changed."
A binary heap has no update-in-place, no decrease-key without extra machinery. So the standard way to use a heap under mutating values is lazy deletion: never update an entry, just push a fresh (gain, symbol) on each tick and carry a side structure (a generation counter or a current-value dict) to recognize and skip stale entries when they surface. That's correct, and it's what I built.
def ingest(self, t):
g = (t.price - self._prev[t.symbol]) / self._prev[t.symbol]
self._gain[t.symbol] = g
gen = self._gen[t.symbol] = self._gen.get(t.symbol, 0) + 1
heapq.heappush(self._max, (-g, gen, t.symbol)) # gainers
heapq.heappush(self._min, (g, gen, t.symbol)) # losers
Reading a board pops entries, skips any whose generation is out of date, collects 25 live ones, and pushes them back. Correct on every tick. I verified it against a brute-force oracle and it held.
The tax I didn't price in
Here's what I hadn't accounted for. Lazy deletion mints a dead entry on every update, and in this problem every tick is an update. At 20k ticks a second, each heap grows by 20k entries a second, almost all of them tombstones within moments. The heaps don't trend toward n = 20,000; they trend toward the number of ticks in the whole session, which is millions.
None of that showed up in the correctness tests. The oracle was green and the logic was right. It showed up when I replayed a full day's tape at production rate and watched two things drift:
Memory climbed steadily all session, tracking tick count rather than symbol count.
Read latency crept up with it. Pulling the top 25 meant popping past a thickening crust of stale entries before reaching 25 live ones.
My first patch was a rebuild: sweep each heap from the current-gain dict whenever the stale fraction crossed a threshold, an O(n) pass that existed purely to undo the tombstones. It bounded the growth, but it was plainly a band-aid over a structural mismatch. And I still had the other half of the problem. I needed both ends, so two heaps, and a heap isn't sorted, so pulling 25 in order meant popping and restoring on every read.
None of this was a bug. It was friction the structure created, rebuild logic and two heaps and destructive reads, all of it working around the heap's real limitation here: it can't hold a moving element in place, and it only cleanly exposes one end.
The clarifying realization was about cost, not correctness. At 20k ticks/sec, log₂(20,000) ≈ 14, so an O(log n) operation per tick is nothing; CPU was never the constraint. The constraint was the bookkeeping needed to keep a heap honest. So the question flipped from "how do I make the heap efficient" to "what structure makes the work I actually do, update one element and read both ends, just disappear?"
The Solution: an Ordered Multiset
An ordered multiset, sortedcontainers.SortedList, keyed by (gain, symbol), with a dict of each symbol's current gain. Each tick does a real removal of the old entry and a real insertion of the new one. The structure holds exactly n = 20,000 entries forever; nothing accumulates.
from __future__ import annotations
from dataclasses import dataclass
from sortedcontainers import SortedList
@dataclass(frozen=True)
class Tick:
symbol: str
price: float
day: int # session date as an integer
class GainBoard:
"""Live top-K gainers and losers over a mutating stream."""
def __init__(self, prev_close: dict[str, float], k: int = 25) -> None:
self._prev = prev_close
self._k = k
self._gain: dict[str, float] = {}
self._ranked: SortedList = SortedList() # ordered by (gain, symbol)
self._day: int | None = None
def _roll(self, day: int) -> None:
if day != self._day: # new session: clear the day
self._day = day
self._gain.clear()
self._ranked.clear()
def ingest(self, t: Tick) -> None:
self._roll(t.day)
g = (t.price - self._prev[t.symbol]) / self._prev[t.symbol]
old = self._gain.get(t.symbol)
if old is not None:
self._ranked.remove((old, t.symbol)) # real removal, not a tombstone
self._gain[t.symbol] = g
self._ranked.add((g, t.symbol))
def top_gainers(self) -> list[str]:
return [s for _, s in reversed(self._ranked[-self._k:])]
def top_losers(self) -> list[str]:
return [s for _, s in self._ranked[: self._k]]
This erases all three pain points at once. Updates are real, so there are no tombstones and no rebuild job. Memory is pinned at n. And because the multiset is ordered, both boards fall out as direct slices: gainers off the back, losers off the front, O(k) each, no popping, no second structure, no destructive reads. One structure, both ends, always current. The per-tick cost is the same O(log n) the lazy heap paid on its push, except now it buys an always-clean structure instead of a growing pile I had to periodically burn down.
(One detail worth pinning: fold the symbol into the key so equal gains are ordered deterministically. Otherwise two stocks at the same gain can swap between reads and your tests flake.)
Other options I weighed
Indexed binary heap (a heap plus a symbol → position map, supporting update-in-place via sift-up/down). This kills the tombstones, since updates become real O(log n) moves, so it's a genuine fix to the memory problem. But a heap still only exposes one end cleanly, so you're back to two heaps and to reads that can't pull an ordered 25 without popping. It solves the smaller half of the problem and leaves the read-both-ends half.
Per-second batch recompute. Since every symbol ticks once a second, you could skip incremental maintenance entirely: keep latest prices in a dict (O(1) per tick) and run heapq.nlargest(25, …) once a second over current values, O(n log k), trivial at 20k. The board would be at most a second stale, which, when each stock only moves once a second anyway, is arguably no staleness at all. I'd genuinely weigh this against the live requirement; the only thing that rules it out is needing the board to reorder within a second as individual ticks land. For us it did, so we stayed incremental, but it's the first question I'd ask before building anything.
How they compare
For n symbols, constant k = 25:
| Approach | Per tick | Read both ends | Catch |
|---|---|---|---|
| Size-K heap | O(log k) |
— | Incorrect under mutating values |
| Full heap + lazy deletion | O(log n) push |
O(k log n) + restore |
Tombstones grow with tick count; needs O(n) rebuilds |
| Indexed heap (update-in-place) | O(log n) |
awkward, two heaps | No tombstones, but reads still fight the heap |
| Ordered multiset | O(log n) |
O(k) slice |
None of note, real removals, both ends free |
Dict + nlargest per second |
O(1) |
O(n log k) on refresh |
Board only moves once per refresh |
Did I believe it? The oracle
Same discipline that caught my tie-break ordering early: a dumb, obviously-correct oracle (recompute every gain, sort, take 25) run against both the lazy-heap version and the multiset version across thousands of randomized tapes. Two thousand runs, both implementations agree with the oracle on gainers and losers, live, after every tick. The oracle is slow on purpose, which is exactly why it's trustworthy enough to judge the fast ones.
What I actually took away
I picked the heap correctly off the name of the problem, "top K." What I should have weighed harder, up front, were the two properties that actually drove the cost: every value mutates every tick, and I need both ends, live. Those two facts make "real removals and an ordered read" worth far more than "fast pushes," and they point straight at an ordered multiset rather than a heap. The heap wasn't wrong. It just made me pay in tombstones and rebuilds for the one thing it can't do, hold a moving element in place and show me both ends. That's a tax worth seeing before you sign up for it, not after.
If you've run a live gainers/losers board at scale, I'd love to hear where you landed: incremental multiset, indexed heap, or batch. And what your tick-to-refresh ratio actually justified.
