SDKs

Python SDK

Cache any model. Crowkis is model-agnostic, you bring the call to OpenAI, Anthropic, Llama, or whatever comes next, and Crowkis serves the repeats for free. Drop it in without restructuring your code.

Install#

One command. Zero dependencies, nothing else to set up.

shell

pip install crowkis

The one idea#

Crowkis doesn't care which model you call. You wrap the call you already make; Crowkis remembers the answer by meaning, so the next time someone asks the same thing in different words, it's served instantly and for free. Any model that produces text, you can cache.

Connect#

python

from crowkis import Crowkis

cache = Crowkis(host="127.0.0.1", port=6379, tenant="my-app")
# use AsyncCrowkis(...) for asyncio apps

Cache any model, the decorator#

The simplest way in. Decorate any function whose first argument is the prompt. It works like functools.lru_cache, but it matches on meaning, so rephrased prompts hit too. The function inside can call any provider.

python

@cache.cached(ttl=3600)
def answer(prompt: str) -> str:
    # this body can call OpenAI, Anthropic, a local model, Crowkis doesn't care
    return my_model(prompt)

answer("How do refunds work?")        # miss → your model runs, result cached
answer("What's the refund process?")  # semantic HIT → no model call

Cache any model, explicit form#

Prefer a function call over a decorator? ask does the same thing inline: return the cached answer, or run your compute and cache it.

python

text = cache.ask(
    "How do refunds work?",
    compute=lambda prompt: my_model(prompt),   # any model
    ttl=3600,
)

Lookup & store directly#

Full control over what gets read and written:

python

hit = cache.lookup("what's the refund timeline?")   # semantic match
if hit:
    print(hit.text, hit.similarity, hit.confidence)
else:
    cache.store("what's the refund timeline?", "5-7 business days.", ttl=3600)

Streaming#

Serve a cached answer in chunks so a hit feels like live output:

python

from crowkis import AsyncCrowkis

async with AsyncCrowkis(port=6379, tenant="my-app") as cache:
    async for chunk in cache.stream("Explain vector caches", compute=my_stream, ttl=3600):
        print(chunk, end="")

LangChain & LangGraph#

Set it once and every LangChain (and LangGraph) model call is cached by meaning, no chain changes. This mirrors LangChain's own set_llm_cache pattern, so it's a true drop-in.

python

from langchain_core.globals import set_llm_cache
from crowkis.integrations.langchain import CrowkisCache

set_llm_cache(CrowkisCache(tenant="my-app", ttl=3600))
# every LLM / chat-model call in your app now checks Crowkis first

Agent memory#

Durable, semantic, per-user memory for agents, LangGraph, CrewAI, AutoGen, or your own loop. Every recall is scoped to its agent and user.

python

from crowkis import CrowkisMemory

mem = CrowkisMemory(agent="support-bot", user="alice")
mem.remember("Alice prefers email over phone")
mem.recall("how should I contact Alice?")   # semantic recall

Discover every command#

python

import crowkis
crowkis.help()            # grouped cheat-sheet of every feature
crowkis.help("memory")    # filter by topic

Method reference#

The full surface, grouped by what it's for. The high-level methods above cover most apps; everything below is available on the Crowkis client (or its helpers) when you want direct control.

caching, model-agnostic

cache.cached(ttl=, threshold=)      # decorator: cache any function's model call
cache.ask(prompt, compute=)        # recall, else run compute() and cache it
cache.stream(prompt, compute=)     # same, streamed in chunks (async)
cache.lookup(prompt)               # → CacheHit(text, similarity, confidence) | None
cache.store(prompt, answer, ttl=)  # write an answer for a prompt
cache.similar(prompt, k=10)        # k most similar cached prompts   [csim]
cache.embed(text)                  # get the raw embedding vector    [cembed]
cache.flush()                      # clear this tenant's cache        [cflush]

agent memory, CrowkisMemory(agent, user=)

mem.remember(fact, ttl=)     # store a fact to recall later
mem.recall(query, k=5)       # semantically recall relevant facts
mem.extract(conversation)    # pull salient facts out of a transcript
mem.history(query, k=5)      # recall including superseded versions
mem.as_of(query, unix_ms)    # recall memory as it stood at a point in time
mem.forget(query=)           # forget matching facts
mem.link(subj, rel, obj)     # add a knowledge-graph edge
mem.graph(entity, depth=)    # walk the knowledge graph

conversation sessions

cache.csession_add(session, role, text)      # append a turn
cache.csession_recent(session, n=)           # the last n turns
cache.csession_search(session, query, k=)    # search within a session

quality & safety

cache.cpin(query, answer)        # pin a curated answer that always wins
cache.cpinget(query)             # fetch a pinned answer
cache.cunpin(query)              # remove a pin
cache.cflag(query, bad_answer)   # mark a bad answer so it stops being served
cache.ccheckbad(query)           # has this been flagged?
cache.cguard(text)               # input safety check
cache.coutcheck(text)            # output safety check

cost, limits & compliance

cache.cbudget_set(tenant, daily_usd=, monthly_usd=)   # spend budget the gateway enforces
cache.cbudget_get(tenant)                            # current spend vs budget
cache.cbudget_alerts()                               # budget alerts
cache.ckeylimit_set(tenant, rpm=, tpm=)              # per-tenant rate limits
cache.cpii_report(tenant)                            # PII exposure report
cache.cpii_erase(identifier)                         # right-to-erasure
cache.cdedup(tenant)                                 # duplicate report

operations & persistence

cache.cinfo(section=)      # server info / stats
cache.csave(dest)          # snapshot to disk
cache.cbgsave(dest)        # background snapshot
cache.creload()            # reload from the latest snapshot

management API, CrowkisAdmin(base_url) over HTTP

admin.get_stats()                          # cache stats & hit rate
admin.health()                             # health check
admin.register_webhook({...})              # source-linked invalidation
admin.invalidate_source(source_id)         # purge everything tagged to a source
admin.flush_tenant(tenant_id)              # wipe one tenant

Sensible timeout, max_retries, and backoff defaults ship out of the box, override them in the constructor when your latency budget is stricter. The embedding model lives server-side, so every client (including redis-py / ioredis) uses whatever model the server runs.