featuresMay 25, 2026· 3 min read

The AI Gateway: a semantic cache in front of any OpenAI-compatible API

Point your existing OpenAI client at Crowkis and change nothing else. The gateway proxies /v1/chat/completions, serves semantic hits without an upstream call, and adds retries, routing, and rate limits.

The lowest-friction way to put a cache in front of your model is to not change your code at all. Crowkis's AI Gateway exposes an OpenAI-compatible POST /v1/chat/completions endpoint: point your existing client's base URL at Crowkis, and every request flows through a semantic cache on its way to the provider — no SDK swap, no rewrite.

In plain words: Aim your existing OpenAI client at Crowkis instead of OpenAI. Repeated questions get answered from cache with no provider call, and you get retries and routing for free — without changing your code.

On a hit, the gateway answers from cache with no upstream call and no token cost, marking the response with an x-crowkis-cache: hit header so you can measure the savings. On a miss, it forwards to the upstream, caches the result, and returns it. Streaming requests are proxied transparently. The whole thing is off by default — it only activates with CROWKIS_GATEWAY=1 and an upstream configured, so zero-egress stays the default posture.

adoption is one port change

Four doors in, one cache, and the model only sees genuinely new questions.

Around the cache, the gateway adds the operational layer a raw provider call lacks: automatic retries with exponential backoff and jitter, weighted multi-provider routing with failover on error class, and per-key rate limits on the paid tier. The dashboard and /metrics expose requests, cache-hit rate, upstream calls, failovers, and 429s.

The bottom line

It's the adoption path for teams that don't want to learn a cache API: keep your OpenAI client, change one URL, and get semantic caching, resilience, and spend visibility as a side effect of where the requests now go.