A million vectors on a laptop: the honest vector-search numbers
Crowkis is a cache with a vector index, not a vector database — but it should still hold up at scale. We indexed 100K and 1M vectors and measured build time, search latency, and recall. Including where dedicated vector DBs still win.
Crowkis embeds an HNSW vector index in-process, which raises a fair question: how far does that scale before you'd want a real vector database? We measured it honestly at two sizes — 100,000 and 1,000,000 vectors — on the same CPU-only laptop everything else here ran on.
Recall@10 was 100% at both sizes — the neighbours it should find, it finds.
At 100K vectors — comfortably larger than most caches' working sets — search averages 9.2 ms with perfect recall in our test. At 1M, latency rises to 94 ms, still with full recall. Index build throughput holds around 3,000–3,800 vectors per second, so a million vectors indexes in a few minutes.
Now the honest part. These are beta-quality proof points, not a challenge to Qdrant, Pinecone, or Weaviate. Dedicated vector databases lead decisively on raw scale, on operational maturity, and on the billion-vector workloads they're built for. Crowkis's in-process index earns its place by removing a network hop for cache-sized working sets — millions, not billions — where keeping vectors beside the cache engine buys the sub-millisecond read path.
Use the right tool. Crowkis is a semantic cache that happens to search vectors well at cache scale — not a vector DB pretending to be a cache.
If your working set is a few million entries and you want them next to the cache that uses them, the in-process index is a feature. If you're indexing a billion documents for retrieval, that's a vector database's job, and Crowkis will happily sit in front of it as the cache. Knowing which problem you have is the whole decision.