Crowkis vs stuffing the context window: memory is not a prompt
Million-token contexts tempt teams to ship the whole knowledge base with every call. That's not memory — that's paying to re-read the library daily.
Huge context windows created a seductive anti-pattern: skip retrieval, skip caching, just send everything every time and let the model sort it out. It works, in the sense that a limousine works as a grocery cart. Input tokens bill linearly, latency grows with context, and you're purchasing the same comprehension of the same documents on every single request.
Note what's being recomputed: not just the answer, but the reading. The model re-ingests your unchanged handbook ten thousand times a day. Provider prefix-caching discounts soften this; they don't change its nature. The architecture is 'no memory, infinite re-reading' — the most expensive possible implementation of remembering.
Every paraphrase is a fresh bill — unless the cache understands meaning.
Crowkis gives the system actual memory: the answer to a question, once computed from however much context, is stored, gated, and served in under a millisecond to every future paraphrase. Reasoning reuse goes further, recycling the thought-structure for inputs that share its shape. The context window goes back to its real job — novel synthesis — instead of impersonating a database.
The bottom line
Big contexts are a wonderful capability and a terrible default. Memory belongs in a memory layer. Rent the model's attention for new problems only.