engineeringMay 15, 2026· 3 min read

Eviction with a ledger: why LRU is the wrong instinct for an LLM cache

LRU evicts by recency and nothing else. But cache entries have wildly different replacement costs, and forgetting a $0.40 answer to keep a $0.0004 one is just bad accounting.

LRU is a beautiful default for caches whose entries cost the same to rebuild. LLM cache entries violate that premise spectacularly: a one-line factual answer cost a fraction of a cent; a long chain-of-thought analysis cost half a dollar and four seconds. Evicting them by recency alone treats a banknote and a receipt as the same paper.

Crowkis scores eviction candidates on four equal axes: recency, frequency, isolation (does anything else depend on it?), and, the LLM-specific one, compute cost, what you'd pay the provider to regenerate it. An expensive, occasionally-hit reasoning answer outranks a cheap, recently-hit triviality, because the cache's job is maximizing saved spend, not maximizing recentness.

the budget wall, enforced locally

1
runaway agent loop
2
virtual API key · budget + TPM/RPM limits
3
crowkis → provider
4
blocked · alert fired

The wall is enforced before the invoice, not discovered on it.

The economics are intuitive once stated: under memory pressure, the engine sheds the entries that are cheapest to re-buy, holding the portfolio of answers whose regeneration would hurt. Your cache literally optimizes for the shape of your provider bill.

The bottom line

It's accounting, applied to memory management, and like most good accounting, invisible until you compare end-of-month numbers against the naive policy. The dashboard makes the comparison unnecessary; the saved-spend counter already chose sides.