SDKs
Python SDK
Sync and async clients with full RESP3 coverage, plus the get_or_compute pattern that makes caching a one-liner. Working in 60 seconds.
Install#
shell
pip install crowkis
Quick start#
get_or_compute is the core pattern: serve from cache when Crowkis judges reuse safe, otherwise run your function and store the result.
python
from crowkis import CrowkisClient
cache = CrowkisClient(host="127.0.0.1", port=6383, tenant="demo", model="gpt-4o")
answer = cache.get_or_compute(
"Explain vector caches",
lambda query: call_llm(query),
ttl=3600,
)
print(answer.decode("utf-8", errors="replace"))
cache.close()Explicit semantic commands#
When you want direct control over what gets stored and when:
python
cache.cset(
"Explain vector caches",
"Vector caches store embeddings so similar questions reuse answers.",
ttl=3600,
)
cached = cache.cget("Explain vector caches")
print(cached.decode() if cached else "miss")Async streaming#
The async client streams cached answers in chunks, so a cache hit feels like live model output — including when the underlying compute is itself a stream:
python
from crowkis import AsyncCrowkisClient
async with AsyncCrowkisClient(
host="127.0.0.1", port=6383, tenant="demo", model="gpt-4o"
) as cache:
async for chunk in cache.stream_get_or_compute(
"Explain vector caches",
lambda query: openai_stream(query),
ttl=3600,
chunk_tokens=4,
delay_ms=20,
):
print(chunk.decode() if isinstance(chunk, bytes) else chunk, end="")LangChain and LlamaIndex#
First-class adapters for LangChain and LlamaIndex ship with the SDK — the pattern is the same get_or_compute wrapped around your chain's LLM call, so the framework never knows a cache exists.
The client ships with sensible
timeout_sec, max_retries, and backoff defaults — all overridable in the constructor when your latency budget is stricter.