Notes from the nest · 980 posts

The Roost

Engineering notes written by the people building Crowkis. Comparisons, use cases, economics, internals, security, operations, and nothing written just to rank.

vs the field features economics security benchmarks engineering use cases guides operations reference

latestJul 20, 2026· 5 min· vs the field

Semantic cache vs vector database: they solve different problems

A vector database is built for large-scale retrieval. A semantic cache is built for safe answer reuse. Using one for the other's job is where teams get burned.

Read it →

featuresJul 20, 2026· 6m

What is a semantic cache for LLMs? (and why exact-match caching fails)

A plain key-value cache misses the moment a prompt is reworded, and a raw vector cache can serve the wrong answer. A semantic cache understands meaning and structure, and only reuses when it's safe.

featuresJul 19, 2026· 6m

Reasoning reuse: cache the chain of thought, not just the answer

The expensive part of a hard answer is the thinking. Crowkis stores reasoning as a reusable step graph and replays it for the next question that shares its shape, at a fraction of the token cost.

economicsJul 19, 2026· 5m

How to cut LLM API costs with semantic caching

The cheapest token is the one you never spend twice. Here's the simple math behind semantic caching, and where the savings actually come from.

featuresJul 18, 2026· 5m

Self-hosted RAG with CDOC: chunking, metadata filters, reranking

If your corpus fits a cache, you don't need a separate vector database to do retrieval. CDOC adds documents with auto-chunking, filtered search, and reranking, all local.

securityJul 18, 2026· 5m

Prompt-injection and jailbreak detection at the cache layer

Attackers disguise injections with odd spacing and character swaps. CGUARD normalizes the disguise first, then scans, so the trick that beats a naive filter doesn't beat this.

benchmarksJul 18, 2026· 7m

We put our tiny embedding model up against OpenAI and NVIDIA. It didn't blink.

crowsight is a small, offline embedding model that ships inside Crowkis. We didn't trust it on faith, so we made it compete with the biggest embedding APIs on the one job a semantic cache actually needs. Here's what happened.

featuresJul 17, 2026· 4m

Give Claude and your agents a cache via MCP

The Crowkis binary doubles as an MCP server, so Claude Desktop, Claude Code, and any MCP-capable agent can check the cache before calling the model and store what they compute.

featuresJul 17, 2026· 5m

A drop-in OpenAI-compatible AI gateway with a semantic cache in front

Point your existing OpenAI client at Crowkis and change nothing else. Repeated questions are served from cache with no upstream call, and you get retries and routing for free.

economicsJul 17, 2026· 6m

Stop paying twice for the same answer

Most LLM bills are quietly full of duplicates, the same question, reworded, billed at full price every time. Semantic caching is how you stop paying for an answer you already have.

featuresJul 16, 2026· 4m

Local, offline embeddings with CEMBED (no external API)

Embeddings usually mean an API key and a per-token bill. CEMBED turns text into vectors using the bundled local model, for free, with nothing leaving your machine.

engineeringJul 16, 2026· 5m

A Redis drop-in for AI: RESP3 compatibility, semantic brain

Crowkis speaks RESP3, so redis-py, ioredis, and Lettuce connect unmodified. Adoption is a port change, not a rewrite, and the semantic commands sit right beside the familiar ones.

engineeringJul 16, 2026· 6m

Redis is the fastest cache alive. It also has no idea what your users are asking.

Redis is a masterpiece, for exact-match lookups. But nobody asks your app exact-match questions. Here's why we kept its wire protocol and taught the cache to understand meaning.

use casesJul 15, 2026· 6m

Long-term memory for AI agents, explained

Most agents forget the moment a session ends. Real memory consolidates contradictions, blends relevance with recency, and can even tell you what it believed at a past point in time.

vs the fieldJul 15, 2026· 5m

Looking for a GPTCache alternative? What to compare

If you're evaluating semantic caches, similarity is the easy part. The differences that matter in production are safety, isolation, confidence, and cost control.

engineeringJul 15, 2026· 7m

A million vectors, still instant: the search engine we wrote in Rust

Finding the nearest meaning among a million cached answers, in under a millisecond, without a single external dependency. A look at the pure-Rust HNSW engine underneath Crowkis.

guidesJul 14, 2026· 4m

Cache Dify in your healthcare Q&A assistant with Crowkis

Building healthcare Q&A assistants on Dify? Add a semantic cache so recurring policy and triage questions stop costing full price.

guidesJul 14, 2026· 4m

Cache the Gemini SDK in your code review bot with Crowkis

Building code review bots on the Gemini SDK? Add a semantic cache so the same review patterns across pull requests stop costing full price.

guidesJul 14, 2026· 4m

Cache the Vercel AI SDK in your research assistant with Crowkis

Building research assistants on the Vercel AI SDK? Add a semantic cache so overlapping literature and summary questions stop costing full price.

guidesJul 14, 2026· 4m

Cache AutoGen in your API documentation bot with Crowkis

Building API documentation bots on AutoGen? Add a semantic cache so the same endpoint questions from every developer stop costing full price.