Eviction with a ledger: why LRU is the wrong instinct for an LLM cache
LRU evicts by recency and nothing else. But cache entries have wildly different replacement costs — and forgetting a $0.40 answer to keep a $0.0004 one is just bad accounting.
LRU is a beautiful default for caches whose entries cost the same to rebuild. LLM cache entries violate that premise spectacularly: a one-line factual answer cost a fraction of a cent; a long chain-of-thought analysis cost half a dollar and four seconds. Evicting them by recency alone treats a banknote and a receipt as the same paper.
Crowkis scores eviction candidates on four equal axes: recency, frequency, isolation (does anything else depend on it?), and — the LLM-specific one — compute cost, what you'd pay the provider to regenerate it. An expensive, occasionally-hit reasoning answer outranks a cheap, recently-hit triviality, because the cache's job is maximizing saved spend, not maximizing recentness.
The wall is enforced before the invoice, not discovered on it.
The economics are intuitive once stated: under memory pressure, the engine sheds the entries that are cheapest to re-buy, holding the portfolio of answers whose regeneration would hurt. Your cache literally optimizes for the shape of your provider bill.
The bottom line
It's accounting, applied to memory management — and like most good accounting, invisible until you compare end-of-month numbers against the naive policy. The dashboard makes the comparison unnecessary; the saved-spend counter already chose sides.