One signed Docker image. Every feature compiled in. Free to run. docker pull crowkis/crowkis:latest

Notes from the nest · 100 posts

The Roost

Engineering notes written by the people building Crowkis. Comparisons with everything else, use cases, economics, internals, security, operations — and nothing written to rank on a search engine.

latestJune 9, 2026· 3 min· engineering

Why Crowkis is Rust all the way down

A cache lives in the hot path of every request. The language choice isn't aesthetic — it's the difference between predictable microseconds and mystery pauses.

Read it

engineeringJun 9, 2026· 3m

Why Crowkis is Rust all the way down

A cache lives in the hot path of every request. The language choice isn't aesthetic — it's the difference between predictable microseconds and mystery pauses.

vs the fieldJun 8, 2026· 3m

Crowkis vs Redis: same protocol, different century

Redis is magnificent infrastructure for exact-match workloads. LLM traffic isn't one. Here's why speaking the same protocol doesn't mean solving the same problem.

operationsJun 8, 2026· 3m

The five-minute deploy, timed honestly

Pull, run, first hit in the dashboard — with no config file, no signup, and no environment variables you're required to set. We timed it. It holds.

use casesJun 7, 2026· 3m

Support bots are the single best caching workload in software

Nowhere else do thousands of people ask the same fifty questions, all day, in every phrasing imaginable. Crowkis was practically designed in a support queue.

economicsJun 6, 2026· 3m

The token math of repetition: what your duplicate questions actually cost

Take your daily query volume, multiply by the repeat fraction, multiply by your blended price per call. That number, twelve times a year, is the cache argument.

securityJun 6, 2026· 3m

Prompt injection meets your cache: the attack nobody threat-modeled

Injected instructions in one response become served truth for every similar query — unless the cache can smell an answer that doesn't answer.

vs the fieldJun 5, 2026· 3m

Crowkis vs GPTCache: the difference between a library and infrastructure

GPTCache proved developers want semantic caching. Crowkis is what happens when that idea grows up, moves out of your Python process, and gets a security model.

operationsJun 5, 2026· 3m

Upgrades as non-events: the binary-swap contract

docker pull, restart, done — no schema migrations, no export/import, no upgrade runbook. The on-disk format is a stability promise, not an implementation detail.

use casesJun 4, 2026· 3m

Internal copilots: your whole company asks the same questions

HR policy, expense rules, deploy commands, VPN setup — every employee rediscovers them through your copilot, billed per discovery. Give the company one memory.

use casesJun 3, 2026· 3m

RAG apps: cache the synthesis, not just the retrieval

Your vector store finds the chunks fast. Then the model re-synthesizes the same answer from the same chunks, thousands of times. That second step is the bill.

securityJun 3, 2026· 3m

The supply-chain argument, made carefully

After the 2026 gateway compromise, 'how many packages are in your hot path?' became a real procurement question. Our answer is a number: zero.

engineeringJun 2, 2026· 8m

How Crowkis earned the right to sit in your critical path

347 integration tests, a smoke suite that kills the process on purpose, and a Docker image hardened before anyone asked. The receipts behind 'production-ready.'

economicsJun 2, 2026· 3m

Why Crowkis refuses to meter you

A cache exists to make costs predictable. Metering the cache would be self-defeating. So Community is free and Enterprise is flat per cluster — priced on a call, not a meter.

operationsJun 2, 2026· 3m

Three windows into one cache: dashboard, Prometheus, logs

The built-in dashboard for humans, /metrics for your Grafana, one JSON line per event for your pipeline — same truth, three consumers, zero adapters.

vs the fieldJun 1, 2026· 3m

Crowkis vs LiteLLM-style gateways: caching is not a checkbox

Python gateways treat caching as one feature among forty. Crowkis treats it as the product — and ships it without a Python supply chain attached.

engineeringJun 1, 2026· 3m

HNSW without the network hop: why the vector index lives inside the engine

Most semantic caches call out to a vector database. Crowkis embeds the HNSW graph in-process — and that placement decision is worth more than any algorithm tweak.

use casesMay 31, 2026· 3m

Agent fleets are token furnaces. Crowkis is the heat exchanger.

Agents re-ask, re-plan, and re-fetch with industrial enthusiasm. Multiply by a fleet and you get the most cacheable traffic in existence — if the cache understands agents.

securityMay 31, 2026· 3m

Tenant isolation as physics, not policy

A WHERE clause is a promise; a namespace is a wall. How Crowkis makes cross-tenant leakage structurally impossible rather than procedurally unlikely.

economicsMay 30, 2026· 3m

Replay: the demo that uses your data instead of our slides

Every cache vendor promises a hit rate. Crowkis Replay computes yours — on your real queries, before you spend anything. The pitch is a number with your name on it.

operationsMay 30, 2026· 3m

Crowkis on Kubernetes: a well-behaved citizen

One container, a PVC, real health probes, hard memory bounds, graceful shutdown. Everything your cluster expects from a tenant that's read the manual.

engineeringMay 29, 2026· 3m

The write-ahead log: how the cache survives a kill -9

Durability isn't a checkbox — it's a sequence of writes in the right order with checksums at every step. Here's the boring machinery that makes restarts uneventful.

vs the fieldMay 28, 2026· 3m

Crowkis vs Portkey: the gateway routes, the cache remembers

Portkey is a control panel for LLM calls. Crowkis is the memory underneath them. Confusing the two costs you the savings both promise.

securityMay 28, 2026· 3m

PII in a cache: scrub, isolate, erase, prove

Users put personal data in prompts whether you like it or not. The cache's job is a full lifecycle: keep it out of shared entries, find it on demand, erase it provably.

use casesMay 27, 2026· 3m

AI coding assistants: the cache your team didn't know it was sharing

Every developer on your team asks the assistant the same questions about the same codebase. With Crowkis behind MCP, the second ask is free for everyone.

engineeringMay 27, 2026· 3m

Bloom filters: how the engine knows what it doesn't know

The fastest disk read is the one that never happens. A few bits per key let Crowkis skip files that can't contain your answer — at a 1% false-positive cost we chose on purpose.

economicsMay 26, 2026· 3m

Budgets with teeth: why your LLM spend needs a circuit breaker

Every team has a runaway-loop story that ends with a shocking invoice. Per-key budgets with hard TPM and dollar walls end the genre.

operationsMay 26, 2026· 3m

Running a model canary: the operator's walkthrough

Slice the traffic, compare against cached baselines, promote or retreat — model upgrades as a controlled experiment with the cache as your measuring instrument.

vs the fieldMay 25, 2026· 3m

Crowkis vs Helicone-style observability: seeing the waste isn't saving it

Observability tools show you beautiful charts of money leaving. Crowkis is the component that makes the chart go down.

securityMay 25, 2026· 3m

Fail closed: why misconfiguring Crowkis locks it instead of opening it

Most self-hosted breaches are defaults, not exploits. Crowkis inverts the failure direction: forget to configure auth and you get a locked deployment, not an open one.

use casesMay 24, 2026· 3m

E-commerce assistants: catalog questions on repeat, margins on the line

Shipping times, return windows, size guides, 'does this come in blue?' — commerce traffic is seasonal, spiky, and gloriously repetitive. Cache accordingly.

engineeringMay 24, 2026· 3m

Twelve intents: why the cache treats a poem differently from a fact

One similarity threshold for all traffic is how caches embarrass themselves. Crowkis classifies every query into one of twelve intents, each with its own rules of reuse.

economicsMay 23, 2026· 3m

Latency is money: the second invoice nobody itemizes

Every multi-second model wait is paid twice — once in tokens, once in user patience. The cache refunds both, but only one shows up in accounting.

operationsMay 23, 2026· 3m

Fallback routing: surviving your provider's bad day

Providers have incidents; your product doesn't have to. Health-aware backend routing plus a warm cache turns upstream outages into degraded modes users barely notice.

vs the fieldMay 22, 2026· 3m

Crowkis vs Pinecone: a vector database is not a cache

Pinecone answers 'what's similar?'. A production cache must answer 'is this safe to serve?'. Those are different questions with different architectures.

securityMay 22, 2026· 3m

The trust ledger: institutional memory for an immune system

Every accept and refuse, per source, append-only. Trust with memory changes attacker economics — and gives auditors the artifact they actually want.

use casesMay 21, 2026· 3m

EdTech tutors: a thousand students, one curriculum, one cache

Every cohort asks why the quadratic formula works. Teach the model once per concept, not once per student — while keeping personalized work personal.

engineeringMay 21, 2026· 3m

Structural templates: the matching layer vectors can't see

Embeddings blur exactly where caches need precision — numbers, dates, entities. Template abstraction catches what cosine similarity structurally cannot.

economicsMay 20, 2026· 3m

Before you downgrade the model, cache the good one

Cost pressure pushes teams toward cheaper, dumber models. Caching offers the opposite trade: keep frontier quality, pay small-model prices on the traffic that repeats.

operationsMay 20, 2026· 3m

Memory governance: a cache that respects its container

CROWKIS_MEMORY_LIMIT means what it says — no GC mood swings, no mystery RSS, eviction that engages before the kernel has opinions.

vs the fieldMay 19, 2026· 3m

Crowkis vs Weaviate, Qdrant, and Milvus: stop assembling your cache from parts

Every DIY semantic cache is a vector database, a Redis, a cron job, and a prayer. Crowkis is the version where the parts were designed for each other.

securityMay 19, 2026· 3m

Air-gapped by design: AI caching where the internet isn't invited

No phone-home, offline license verification, one binary. The deployment story for networks that treat outbound packets as incidents.

engineeringMay 18, 2026· 11m

Why we wrote our own LSM tree instead of bolting onto RocksDB

Every sane checklist says don't write your own storage engine. We did it anyway. Here's the actual reasoning, the architecture, and the parts that were painful.

use casesMay 18, 2026· 3m

Healthcare AI: caching under HIPAA without holding your breath

Clinical-adjacent assistants repeat administrative and informational answers constantly — but every cached byte is regulated. This is what compliance-mode caching looks like.

engineeringMay 18, 2026· 3m

Reasoning reuse: caching how the model thinks, not just what it says

Chain-of-thought tokens are the most expensive ones you buy. Crowkis extracts the thought's skeleton, abstracts the specifics, and recomposes it for the next input that shares its shape.

economicsMay 17, 2026· 3m

Agent unit economics: making the per-task math survive contact with reality

Agents multiply model calls per user action by 10–50x. Without aggressive reuse, the unit economics of agentic products simply don't close.

operationsMay 17, 2026· 3m

A tour of the dashboard: six panels, zero mysteries

Live verdicts, hit-type economics, top misses, safety blocks, tenant accounting, system pressure — what each panel answers and who keeps it open.

vs the fieldMay 16, 2026· 3m

Crowkis vs pgvector: your database deserves better than your cache traffic

pgvector is a lovely extension for storing embeddings next to your data. Routing every LLM query through Postgres is how lovely things die.

securityMay 16, 2026· 3m

Compliance modes: HIPAA, SOC2, GDPR-EU, FedRAMP as configuration

Each regime wants specific retention, audit, and erasure behavior. Enterprise compliance modes preset the whole posture, so the auditor's checklist maps to a flag.

use casesMay 15, 2026· 3m

Fintech assistants: fast answers, frozen correctness

Money questions repeat endlessly and tolerate zero staleness. Fintech is where freshness control stops being a feature and becomes the product.

engineeringMay 15, 2026· 3m

Eviction with a ledger: why LRU is the wrong instinct for an LLM cache

LRU evicts by recency and nothing else. But cache entries have wildly different replacement costs — and forgetting a $0.40 answer to keep a $0.0004 one is just bad accounting.

economicsMay 14, 2026· 3m

Why Community is actually free: the honest economics of our free tier

Full engine, production use, no license, no meter, no time bomb. Here's why giving the small end away is the rational structure, not a teaser.

operationsMay 14, 2026· 3m

The world's shortest cache runbook

Fail-open design means most 'incidents' are the absence of savings, not the presence of errors. Here's the whole decision tree, which fits on an index card.

vs the fieldMay 13, 2026· 3m

Crowkis vs Momento: your cache shouldn't bill like the thing it's saving you from

Serverless caches meter every operation. A cache that charges per request in front of an API that charges per request is a strange kind of savings.

securityMay 13, 2026· 3m

Four doors, four locks: the authentication architecture

RESP, gRPC, REST, and the dashboard each get auth that fits their use — constant-time tokens for the data plane, RBAC for the control plane, mandatory locks past loopback.

use casesMay 12, 2026· 3m

Government and defense: the cache that works where the internet doesn't

Air-gapped networks, FedRAMP postures, and zero phone-home tolerance rule out most AI infrastructure on page one. Crowkis was designed to pass that page.

engineeringMay 12, 2026· 3m

Five TTL policies: engineering the shelf life of truth

Answers age at different speeds — prices in days, math never. A single TTL knob can't express that, so Crowkis ships five policies plus version pinning and webhooks.

economicsMay 11, 2026· 3m

The CFO pitch: explaining the cache to the person who signs things

Three sentences, one dashboard number, and a flat price. The rare infrastructure purchase that finance understands faster than engineering does.

operationsMay 11, 2026· 3m

Boring on purpose: the operational philosophy

Exciting infrastructure is a contradiction in terms. Every Crowkis design decision optimizes for the same review: 'it just runs.'

vs the fieldMay 10, 2026· 3m

Crowkis vs ElastiCache: managed Redis is still Redis

AWS will happily run an exact-match cache for you at any scale. It will miss your LLM traffic at any scale, too.

securityMay 10, 2026· 3m

Closed-source as a security posture, argued honestly

'Many eyes' assumes the eyes show up. For your hot path, a signed single binary with zero dependencies is a smaller attack surface than a thousand auditable packages nobody audits.

use casesMay 9, 2026· 3m

Multi-tenant SaaS: one cache, many customers, zero leaks

Caching across customers multiplies savings and multiplies risk. Tenant isolation has to be architecture, not a WHERE clause.

engineeringMay 9, 2026· 3m

Why we kept the Redis protocol instead of inventing an API

Every new API is a tax on adoption: clients, docs, muscle memory, tooling. RESP3 meant inheriting twenty years of all four on day one.

economicsMay 8, 2026· 3m

Provider arbitrage: paying frontier prices only for frontier questions

Model prices vary 50x for overlapping quality on easy queries. The arbitrage router exploits the spread automatically, with a quality bar you set per intent.

vs the fieldMay 7, 2026· 3m

Crowkis vs Memcached: a beautiful fossil meets a new workload

Memcached is the purest cache ever written — and purity is exactly the problem when your keys are sentences.

use casesMay 6, 2026· 3m

Startups: your LLM bill is eating runway you'll want back

Seed-stage AI products routinely spend salary-sized sums recomputing known answers. Free Community edition exists precisely for this moment of your company.

engineeringMay 6, 2026· 3m

One actor, no locks across await: the concurrency design

Crowkis serves thousands of connections through async IO — then funnels every cache decision through a single deterministic actor. Here's why that's a feature.

economicsMay 5, 2026· 3m

The hidden invoice of a cold cache: what model migrations really cost

Swap models with a normal cache and you re-purchase your entire corpus at the new model's prices. Migration leasing is the line item that prevents the line item.

vs the fieldMay 4, 2026· 3m

Crowkis vs Dragonfly, Valkey, and KeyDB: faster exact-matching is still exact-matching

The new Redis-compatibles race each other on throughput. On LLM traffic they all hit the same wall at full speed: the keys never repeat.

use casesMay 3, 2026· 3m

Platform teams: make caching a paved road, not a per-team adventure

Every product team is duct-taping its own LLM cache right now. Platform engineering exists to end exactly this kind of duplication.

engineeringMay 3, 2026· 3m

Designing the MCP server: a cache as a tool the model can hold

MCP turns Crowkis into something an AI assistant can use deliberately — check the cache, store the answer — over plain stdio, with the banner silenced so JSON-RPC stays clean.

economicsMay 2, 2026· 3m

The ROI timeline: hour one, week one, quarter one

Caching ROI isn't a hockey stick — it's a staircase that starts the first hour. Here's the honest schedule of when each saving shows up.

vs the fieldMay 1, 2026· 3m

Crowkis vs OpenAI prompt caching: a discount is not a cache

Provider prompt caching discounts your repeated prefixes. You still call the model, still wait, and still pay — just slightly less. There's a bigger idea available.

securityApr 30, 2026· 9m

Cache poisoning is the whole problem

Semantic caching has an obvious failure mode nobody likes to talk about: one bad write, served forever to everyone nearby. This is how Crowkis decides what to trust.

use casesApr 30, 2026· 3m

Consumer chat at scale: when every millisecond and every token multiply

At consumer scale, traffic converges on shared intents while costs and latency multiply by millions. The cache becomes load-bearing infrastructure.

engineeringApr 30, 2026· 3m

Three levels, one strategy: compaction without the tuning PhD

LSM compaction is where storage engines breed complexity. Crowkis ships exactly one strategy across three levels — chosen for cache workloads, closed for configuration.

vs the fieldApr 28, 2026· 3m

Crowkis vs Anthropic prompt caching: cache writes that bill you are telling you something

Anthropic's prompt caching is excellent at its actual job — cheap long contexts. It was never designed to be your response cache, and the pricing says so.

use casesApr 27, 2026· 3m

Voice assistants: caching as a conversational necessity

Voice gives you about a second before silence feels broken. Model round-trips don't fit. Cache hits do — with room to spare for the speech stack.

engineeringApr 27, 2026· 3m

Streaming cache hits: instant answers that still feel like typing

Users expect LLM answers to arrive as a typing stream. CGETSTREAM serves cached answers chunk by chunk, so a sub-millisecond hit doesn't break the interface's rhythm.

vs the fieldApr 25, 2026· 3m

Crowkis vs Gemini context caching: renting memory by the hour

Google bills cached context per token per hour — a parking meter for your own prompts. Compare that with a cache you simply own.

use casesApr 24, 2026· 3m

Translation pipelines: the same strings, the same languages, every release

Product copy, help docs, and templates get re-translated continuously as releases churn. Most of the content didn't change. Stop paying as if it did.

engineeringApr 24, 2026· 3m

347 tests and a murder weapon: how the suite is organized

Bottom-heavy by design: the layers that hold your data get the most hostile coverage, and the smoke suite's signature move is killing the process to prove a point.

vs the fieldApr 22, 2026· 3m

Crowkis vs vLLM prefix caching: different layers, different physics

vLLM's prefix caching saves GPU work inside one inference server. Crowkis saves the inference itself. You probably want both — but only one cuts the bill to zero on a hit.

use casesApr 21, 2026· 3m

Summarization at scale: the same documents keep getting summarized

Reports, tickets, calls, and articles get summarized on every view, by every viewer, in every digest. The document didn't change between viewers. The bill did.

vs the fieldApr 19, 2026· 3m

Crowkis vs LangSmith: tracing the waste vs deleting it

LangSmith shows you every span of every chain, beautifully. The spans are still billed. There's a component whose job is making the spans not happen.

use casesApr 18, 2026· 3m

Classification and extraction: high-volume, low-variance, born to be cached

Routing tickets, tagging content, extracting fields — LLM classification runs millions of small calls over heavily repeating inputs. The cache hit rate is absurd, in your favor.

vs the fieldApr 16, 2026· 3m

Crowkis vs Cloudflare AI Gateway: the edge is the wrong place for trust decisions

Cloudflare's gateway adds caching at the CDN layer — exact-match, eventually-evicted, on someone else's network. Useful plumbing; not a reuse brain.

use casesApr 15, 2026· 3m

Docs assistants: your documentation has a top-40 chart

Every docs site has the same hit parade — auth, rate limits, pagination, that one confusing endpoint. The assistant answering them should not bill like a consultant.

vs the fieldApr 13, 2026· 3m

Crowkis vs Kong AI Gateway: plugins are not engines

Kong added AI plugins to a great API gateway. A semantic-cache plugin in a proxy is a feature; a semantic cache engine is a product. The difference shows in production.

use casesApr 12, 2026· 3m

Answer-engine products: when the answer is the product, margin is the moat

If your product is answering questions, your COGS is the model bill and your UX is the latency. The cache moves both — which makes it strategy, not plumbing.

vs the fieldApr 10, 2026· 3m

Crowkis vs building it yourself: a love letter to the repo you'll abandon

Every team builds the in-house semantic cache once. The prototype takes a week. The production version takes the year you didn't budget. We know — we budgeted it.

vs the fieldApr 7, 2026· 3m

Crowkis vs Redis LangCache: when the incumbent validates the category

Redis shipping a semantic cache service confirms the problem is real. Their answer is a managed add-on; ours is a from-scratch engine. The difference is in the bones.

vs the fieldApr 4, 2026· 3m

Crowkis vs framework caches: your framework should not own your memory

LangChain, LlamaIndex, and Semantic Kernel all offer cache hooks. Framework caches live and die with the framework. Infrastructure shouldn't.

vs the fieldApr 1, 2026· 3m

Crowkis vs AWS Bedrock prompt caching: the cloud's cache serves the cloud

Bedrock's caching cuts repeated-prefix costs inside one cloud's model garden. Your cache strategy deserves a longer horizon than a vendor's feature page.

vs the fieldMar 29, 2026· 3m

Crowkis vs LangChain InMemoryCache: the default that quietly costs the most

One import gives you LangChain's in-memory exact cache. It's the caching equivalent of a sticky note — gone on restart, blind to paraphrase, local to one process.

vs the fieldMar 26, 2026· 3m

Crowkis vs Upstash: pay-per-request caching meets the request firehose

Serverless Redis with per-request pricing is elegant for occasional workloads. An LLM cache is the opposite of an occasional workload.

vs the fieldMar 23, 2026· 3m

Crowkis vs the dedup script: the cron job that thinks it's a cache

Somewhere in your repo is a script that hashes prompts and skips duplicates. It's doing its best. Here's everything it can't see.

vs the fieldMar 20, 2026· 3m

Crowkis vs Chroma: the prototype's best friend meets the production path

Chroma is wonderful for getting embeddings working before lunch. The qualities that make it great for prototypes are the ones a cache in production can't keep.

vs the fieldMar 17, 2026· 3m

Crowkis vs doing nothing: the most expensive cache is no cache

The default strategy — every query goes to the model — has a precise cost. It's on your invoice, itemized as everything.

vs the fieldMar 14, 2026· 3m

Crowkis vs fine-tuning your way to cheaper inference

Fine-tuning a smaller model is a months-long bet on cheaper tokens. Caching is a five-minute bet on zero tokens. One of these compounds weekly.

vs the fieldMar 11, 2026· 3m

Crowkis vs stuffing the context window: memory is not a prompt

Million-token contexts tempt teams to ship the whole knowledge base with every call. That's not memory — that's paying to re-read the library daily.

100 posts in the roost · crows remember faces. we remember production incidents.