engineeringMay 29, 2026· 3 min read

The write-ahead log: how the cache survives a kill -9

Durability isn't a checkbox, it's a sequence of writes in the right order with checksums at every step. Here's the boring machinery that makes restarts uneventful.

Caches traditionally shrug at durability, 'it's just a cache', but an LLM cache's contents cost real money to rebuild. Losing a warm corpus to a pod reschedule means re-purchasing it from your provider at full price. So CrowkisDB treats every entry as worth keeping: writes land in the write-ahead log before anything else, each record framed with a CRC32 checksum, fsynced according to policy.

In plain words: Everything is written to a crash-proof journal first. Pull the plug mid-write and Crowkis replays the journal on restart, your cache comes back exactly as it was.

Recovery is the payoff: on startup the engine replays the log from the last checkpoint, validating every checksum, rebuilding the memtable to the exact pre-crash state. A torn write at the tail, the classic power-cut artifact, fails its checksum and truncates cleanly instead of poisoning the replay. The vector index recovers alongside, so semantic search resumes where it stopped.

the write-trust pipeline

1
candidate write
2
coherence · 0.30
3
content · 0.10
4
source trust · 0.30
5
isolation · 0.15
6
neighbourhood · 0.15
7
composite ≥ 0.75?
8
accepted
9
refused + ledger entry

Five stages score every write before it can ever be served.

We don't trust this machinery; we attack it. The integration suite kills the process mid-write and verifies every acknowledged entry survives; the Docker smoke test does the same to the whole container. Durability claims that aren't tested by murder are marketing.

The bottom line

The result is operationally liberating: deploys, reschedules, and crashes are non-events. The cache you had is the cache you have. Boring, by design, provably.