Where the milliseconds go: an honest latency profile
A semantic cache hit isn't free — it has to embed your query first. We measured every operation's percentiles so you know exactly what you're paying for, and where the cache engine itself is microsecond-fast.
Marketing says 'sub-millisecond cache.' The truth has an asterisk worth understanding, because it tells you exactly when Crowkis is fast and when it isn't. A plain key-value GET — no embedding, no semantics — returns in about a quarter of a millisecond, a true Redis-class drop-in. A semantic CGET is a different animal: before it can search, it has to turn your query into a vector.
The ~115 ms on every semantic op is the ONNX embedding cost — not the cache engine, which resolves in microseconds.
Once you see that the embedding is the cost, the optimization writes itself: don't embed what you've already embedded. Crowkis added a micro-cache that remembers recent embeddings, and for exact-repeat queries — which dominate real agent and chatbot traffic — the embedding step disappears entirely.
122× faster on the path that real workloads hit most: the same question, again.
The tail tells the same story honestly. CGET-hit p99 sits at 136 ms — tight, because hits do predictable work. CGET-miss p99 stretches to 347 ms, because a miss occasionally does more work confirming there's nothing to serve. Neither number is a mystery pause; both are the embedding model under load, which is why the micro-cache and, on faster hardware, a GPU embedder move them so dramatically.
We publish the p99, not just the p50. A latency claim that only quotes the median is hiding the number that pages you at 3 a.m.
The practical takeaway: Crowkis is microsecond-fast at being a cache and millisecond-fast at being a semantic one, with the embedding as the dial you can turn — micro-cache for repeats, a stronger embedder host for throughput, or plain KV when you don't need meaning at all. Know which operation you're calling and the latency stops being a surprise.