benchmarksJune 16, 2026· 7 min read

Where the milliseconds go: an honest latency profile

A semantic cache hit isn't free — it has to embed your query first. We measured every operation's percentiles so you know exactly what you're paying for, and where the cache engine itself is microsecond-fast.

Marketing says 'sub-millisecond cache.' The truth has an asterisk worth understanding, because it tells you exactly when Crowkis is fast and when it isn't. A plain key-value GET — no embedding, no semantics — returns in about a quarter of a millisecond, a true Redis-class drop-in. A semantic CGET is a different animal: before it can search, it has to turn your query into a vector.

Operation latency, p50 (v0.2.2, CPU-only laptop)ms (p50)

Plain GET (no embed)0.26

CGET hit114.5

CGET miss114.8

CSET (store)118.8

The ~115 ms on every semantic op is the ONNX embedding cost — not the cache engine, which resolves in microseconds.

In plain words: p50 is the median — half of requests are faster. The embedding step dominates every semantic operation, so a hit and a miss cost almost the same: the work is in understanding the query, not in the lookup.

Once you see that the embedding is the cost, the optimization writes itself: don't embed what you've already embedded. Crowkis added a micro-cache that remembers recent embeddings, and for exact-repeat queries — which dominate real agent and chatbot traffic — the embedding step disappears entirely.

Exact-repeat CGET — before and after the embedding micro-cachems

Before (re-embed every time)19.5

After (micro-cache hit)0.16

122× faster on the path that real workloads hit most: the same question, again.

The tail tells the same story honestly. CGET-hit p99 sits at 136 ms — tight, because hits do predictable work. CGET-miss p99 stretches to 347 ms, because a miss occasionally does more work confirming there's nothing to serve. Neither number is a mystery pause; both are the embedding model under load, which is why the micro-cache and, on faster hardware, a GPU embedder move them so dramatically.

We publish the p99, not just the p50. A latency claim that only quotes the median is hiding the number that pages you at 3 a.m.

The practical takeaway: Crowkis is microsecond-fast at being a cache and millisecond-fast at being a semantic one, with the embedding as the dial you can turn — micro-cache for repeats, a stronger embedder host for throughput, or plain KV when you don't need meaning at all. Know which operation you're calling and the latency stops being a surprise.