The hidden invoice of a cold cache: what model migrations really cost
Swap models with a normal cache and you re-purchase your entire corpus at the new model's prices. Migration leasing is the line item that prevents the line item.
Model upgrades carry a cost nobody budgets: the cache wipe. Your corpus of answered questions — weeks of accumulated hits — assumes the old model, so the standard move is flush and rebuild. Every previously-free hit becomes a fresh model call at the new model's prices, concentrated into the weeks after launch. Teams notice the bill spike and blame the new model; the real culprit is the cold start.
Crowkis treats the upgrade as a first-class workflow instead of a wipe: canary the new model on a traffic slice, compare quality against cached baselines, then migrate entries with leasing — old answers keep serving until their replacements are verified, so hit rate never cliffs and recomputation spreads across natural traffic instead of arriving as a spike.
The upgrade is a workflow, not a leap of faith.
The strategic effect is bigger than the saved spike: cold-start dread is why teams postpone upgrades, running last year's model out of cache inertia. Make migrations cheap and you upgrade the moment a better model ships — which, lately, is quarterly.
The bottom line
Count the cold start in every model decision, or better, make it stop existing. Your cache should be an asset that survives your choices, not a hostage to them.