The AI Gateway: a semantic cache in front of any OpenAI-compatible API
Point your existing OpenAI client at Crowkis and change nothing else. The gateway proxies /v1/chat/completions, serves semantic hits without an upstream call, and adds retries, routing, and rate limits.
The lowest-friction way to put a cache in front of your model is to not change your code at all. Crowkis's AI Gateway exposes an OpenAI-compatible POST /v1/chat/completions endpoint: point your existing client's base URL at Crowkis, and every request flows through a semantic cache on its way to the provider — no SDK swap, no rewrite.
On a hit, the gateway answers from cache with no upstream call and no token cost, marking the response with an x-crowkis-cache: hit header so you can measure the savings. On a miss, it forwards to the upstream, caches the result, and returns it. Streaming requests are proxied transparently. The whole thing is off by default — it only activates with CROWKIS_GATEWAY=1 and an upstream configured, so zero-egress stays the default posture.
Four doors in, one cache, and the model only sees genuinely new questions.
Around the cache, the gateway adds the operational layer a raw provider call lacks: automatic retries with exponential backoff and jitter, weighted multi-provider routing with failover on error class, and per-key rate limits on the paid tier. The dashboard and /metrics expose requests, cache-hit rate, upstream calls, failovers, and 429s.
The bottom line
It's the adoption path for teams that don't want to learn a cache API: keep your OpenAI client, change one URL, and get semantic caching, resilience, and spend visibility as a side effect of where the requests now go.