The ROI timeline: hour one, week one, quarter one
Caching ROI isn't a hockey stick — it's a staircase that starts the first hour. Here's the honest schedule of when each saving shows up.
Hour one: deployment and first hits. The container is up in five minutes; the first semantic hit lands as soon as any question repeats — in support and docs workloads, usually within the hour. The dashboard's saved counter starts moving the same afternoon. This is the demo-to-yourself phase, and it costs nothing but the afternoon.
Week one: the head of your distribution warms. Hit rate climbs as the corpus accumulates your canonical questions; latency percentiles visibly split into instant-hits and normal-misses. This is when the top-misses view starts directing pre-warming and when someone screenshots the savings number into Slack.
Every paraphrase is a fresh bill — unless the cache understands meaning.
Quarter one: the structural effects arrive. Budgets and keys turn AI spend into a governed system; a model upgrade happens without a cold start; the per-query blended cost in your unit economics quietly drops a tier and stays there. The cache stops being a project and becomes a number everyone assumes.
The bottom line
The honest caveat: novel-heavy workloads climb the staircase slower, and Replay will tell you that in advance, free. For everyone else, the schedule above is boringly reliable — which is the best thing a cost curve can be.