One signed Docker image. Every feature compiled in. Free to run. docker pull crowkis/crowkis:latest
← back to the Roost
vs the fieldMay 1, 2026· 3 min read

Crowkis vs OpenAI prompt caching: a discount is not a cache

Provider prompt caching discounts your repeated prefixes. You still call the model, still wait, and still pay — just slightly less. There's a bigger idea available.

OpenAI's prompt caching is a good deal worth taking: repeat the same long prefix and the repeated tokens bill at a discount. But notice what it actually is — a pricing tier, not a cache. Every request still travels to the provider, still runs inference on the new tokens, still takes seconds, and still costs real money. The discount applies to inputs only; the expensive output tokens are regenerated at full price, every time.

In plain words: Provider caching makes repeat calls cheaper. Crowkis makes repeat calls disappear. Cheaper is a coupon; disappear is a strategy.

It also only triggers on exact prefix repetition. Your system prompt qualifies; your users' questions don't, because users paraphrase. The traffic that actually dominates your bill — the same question asked fifty ways — gets no discount at all, because no two phrasings share a prefix.

what repeated traffic costs without crowkis

Every paraphrase is a fresh bill — unless the cache understands meaning.

Crowkis operates one level up: when the question means the same thing, the answer doesn't get regenerated at all. No round-trip, no inference, no output tokens — a sub-millisecond local hit, gated by confidence and trust. And it works identically across providers, so your savings don't evaporate the day you switch models.

The bottom line

Stack them, by all means: prefix discounts for the calls that must happen, Crowkis to eliminate the calls that needn't. Just be clear about which one changes the shape of the bill and which one shaves its edges.