engineeringJune 1, 2026· 3 min read

HNSW without the network hop: why the vector index lives inside the engine

Most semantic caches call out to a vector database. Crowkis embeds the HNSW graph in-process, and that placement decision is worth more than any algorithm tweak.

HNSW, hierarchical navigable small worlds, is the workhorse of approximate nearest-neighbour search: a layered graph you descend from sparse highways to dense streets, finding semantic neighbours in logarithmic hops. Everyone uses some variant. The differentiating decision isn't the algorithm; it's where it runs.

The standard architecture puts vectors in a separate service, which taxes every lookup with serialization, a network round trip, and a second system's tail latency, then taxes the design again, because confidence gating needs the candidates' metadata, which means more round trips or denormalized copies drifting out of sync.

the crowkis read path, five gates, every one can veto

1
incoming query
2
intent classifier
3
template match
4
HNSW neighbours
5
confidence gate
6
trust + freshness
7
answer · <1ms
8
(nil) → your model

Reuse only when meaning, structure, confidence, and trust all agree.

Crowkis embeds the graph in the same process as the LSM store and the scoring engines: a lookup walks the graph, reads neighbour metadata, and runs all five gates without crossing a process boundary. That's how the entire read path, embed, search, template-check, score, trust-check, fits inside a millisecond. The graph persists alongside the store and recovers with it, covered by the same crash tests.

The bottom line

Distributed vector databases earn their complexity at billions of vectors. A cache's working set is millions, which fits in one process's memory with room to spare. Choosing the boring placement bought us the headline number.