benchmarksJune 12, 2026· 6 min read

The throughput ceiling we won't hide — and the fix

On v0.2.2, throwing 16 threads at Crowkis got the same throughput as one. That's a real ceiling, we found it in our own harness, and here's both why it happened and how the embedding-deferral work lifts it.

Here's a benchmark result most vendors would quietly drop: on v0.2.2, single-threaded semantic throughput was about 9 operations per second, and sixteen threads delivered… about 9 operations per second. A scaling factor of 1.0. We're publishing it because the reason is instructive and the fix is real.

Semantic CGET throughput vs thread count (v0.2.2)ops/sec

1 thread9

16 threads9

Two bottlenecks stacked: a single-writer actor and a synchronous embed on the hot path.

In plain words: Throughput is how many requests per second the server handles. Scaling means more threads should mean more throughput. A flat line means something is serializing the work — only one thing happens at a time.

Two things stacked up. First, Crowkis funnels cache decisions through a single-writer actor — a deliberate design that makes correctness and crash-recovery provable, but means writes don't run in parallel. Second, and dominant here, every semantic op ran the ONNX embedding inline, so each thread spent ~115 ms in the same synchronous model call. Sixteen threads waiting on the same serial embed is sixteen threads in a line.

The fix is to get embedding off the hot path: cache embeddings for repeats and defer the rest so the actor isn't blocked on the model. With that work, exact-repeat throughput jumps from ~51 ops/sec to ~3,550 ops/sec single-threaded — a 70× improvement — and the multi-thread number finally moves above the floor.

Single-thread throughput after embedding deferralops/sec

v0.2.2 (inline embed)51

after deferral3550

70× on the repeat path. The single-writer actor remains the next ceiling to lift.

A benchmark you only run to win isn't a benchmark, it's an ad. We run ours to find the ceiling — then we go raise it.

Concurrent reads against the single-writer actor are the next infrastructure step, and we're honest that distributed throughput is not where Crowkis competes today. For the workload it's built for — agent fan-out and chatbot traffic, where the same handful of questions repeat constantly — the repeat path is the one that matters, and that's the one the deferral work transforms.