Crowkis vs OpenAI prompt caching: a discount is not a cache
Provider prompt caching discounts your repeated prefixes. You still call the model, still wait, and still pay — just slightly less. There's a bigger idea available.
OpenAI's prompt caching is a good deal worth taking: repeat the same long prefix and the repeated tokens bill at a discount. But notice what it actually is — a pricing tier, not a cache. Every request still travels to the provider, still runs inference on the new tokens, still takes seconds, and still costs real money. The discount applies to inputs only; the expensive output tokens are regenerated at full price, every time.
It also only triggers on exact prefix repetition. Your system prompt qualifies; your users' questions don't, because users paraphrase. The traffic that actually dominates your bill — the same question asked fifty ways — gets no discount at all, because no two phrasings share a prefix.
Every paraphrase is a fresh bill — unless the cache understands meaning.
Crowkis operates one level up: when the question means the same thing, the answer doesn't get regenerated at all. No round-trip, no inference, no output tokens — a sub-millisecond local hit, gated by confidence and trust. And it works identically across providers, so your savings don't evaporate the day you switch models.
The bottom line
Stack them, by all means: prefix discounts for the calls that must happen, Crowkis to eliminate the calls that needn't. Just be clear about which one changes the shape of the bill and which one shaves its edges.