securityJune 6, 2026· 3 min read

Prompt injection meets your cache: the attack nobody threat-modeled

Injected instructions in one response become served truth for every similar query, unless the cache can smell an answer that doesn't answer.

Prompt injection analysis usually ends at the model: attacker smuggles instructions, model misbehaves once, incident closed. Add a naive cache and the blast radius changes category, the injected output gets stored, and the cache faithfully serves it to every future query in the semantic neighbourhood. One successful injection becomes standing infrastructure for the attacker.

In plain words: If an attacker tricks the model once and you cache it, the trick replays forever. Crowkis checks whether an answer actually answers before storing it, and remembers who keeps sending garbage.

Crowkis's coherence stage exists for precisely this shape of attack: injected content characteristically fails to actually answer the question it's stored under, and coherence, weighted 0.30, the joint-heaviest stage, scores that mismatch before storage. Content heuristics add a second look, and neighbourhood agreement flags answers that contradict their semantic peers.

the write-trust pipeline

1
candidate write
2
coherence · 0.30
3
content · 0.10
4
source trust · 0.30
5
isolation · 0.15
6
neighbourhood · 0.15
7
composite ≥ 0.75?
8
accepted
9
refused + ledger entry

Five stages score every write before it can ever be served.

Source trust closes the loop on persistence: a writer whose outputs keep getting refused accumulates ledger history and faces an ever-higher bar, so injection campaigns burn their own access. Every refusal is logged with its stage, your security team sees the attempt pattern, not just the absence of damage.

The bottom line

The principle generalizes: any system that stores model output and re-serves it needs an immune response, because the model will eventually be made to say something hostile. Ours is five stages deep and keeps receipts.