Before you downgrade the model, cache the good one
Cost pressure pushes teams toward cheaper, dumber models. Caching offers the opposite trade: keep frontier quality, pay small-model prices on the traffic that repeats.
The standard cost-reduction playbook reaches for a smaller model: accept a quality haircut everywhere, save per token. It's a real lever, but examine what it trades — every answer gets worse, including the novel ones where quality is the product. You've made the bill smaller by making the product smaller.
Caching inverts the trade. The repeated head of traffic — where the frontier model's answer is already banked — serves at effectively zero cost, full quality. Spend concentrates on the novel tail, where the good model earns its price. Blended cost falls toward small-model territory while quality stays at the frontier where users can tell the difference.
The upgrade is a workflow, not a leap of faith.
The math compounds with hit rate: at a 50% hit rate, your effective per-query cost halves with zero quality loss — equivalent to a 50%-cheaper model that's somehow exactly as good. No distillation project, no eval suite, no regression risk. One container.
The bottom line
Enterprise's arbitrage router lets you do both deliberately: easy intents route to cheap models, hard ones to the frontier, all behind the cache. But the order of operations matters — cache first, downgrade only what the cache and the router prove can survive it.