One signed Docker image. Every feature compiled in. Free to run. docker pull crowkis/crowkis:latest

SDKs

Python SDK

Sync and async clients with full RESP3 coverage, plus the get_or_compute pattern that makes caching a one-liner. Working in 60 seconds.

Install#

shell
pip install crowkis

Quick start#

get_or_compute is the core pattern: serve from cache when Crowkis judges reuse safe, otherwise run your function and store the result.

python
from crowkis import CrowkisClient

cache = CrowkisClient(host="127.0.0.1", port=6383, tenant="demo", model="gpt-4o")

answer = cache.get_or_compute(
    "Explain vector caches",
    lambda query: call_llm(query),
    ttl=3600,
)

print(answer.decode("utf-8", errors="replace"))
cache.close()

Explicit semantic commands#

When you want direct control over what gets stored and when:

python
cache.cset(
    "Explain vector caches",
    "Vector caches store embeddings so similar questions reuse answers.",
    ttl=3600,
)

cached = cache.cget("Explain vector caches")
print(cached.decode() if cached else "miss")

Async streaming#

The async client streams cached answers in chunks, so a cache hit feels like live model output — including when the underlying compute is itself a stream:

python
from crowkis import AsyncCrowkisClient

async with AsyncCrowkisClient(
    host="127.0.0.1", port=6383, tenant="demo", model="gpt-4o"
) as cache:
    async for chunk in cache.stream_get_or_compute(
        "Explain vector caches",
        lambda query: openai_stream(query),
        ttl=3600,
        chunk_tokens=4,
        delay_ms=20,
    ):
        print(chunk.decode() if isinstance(chunk, bytes) else chunk, end="")

LangChain and LlamaIndex#

First-class adapters for LangChain and LlamaIndex ship with the SDK — the pattern is the same get_or_compute wrapped around your chain's LLM call, so the framework never knows a cache exists.

The client ships with sensible timeout_sec, max_retries, and backoff defaults — all overridable in the constructor when your latency budget is stricter.