operationsMay 23, 2026· 3 min read

Fallback routing: surviving your provider's bad day

Providers have incidents; your product doesn't have to. Health-aware backend routing plus a warm cache turns upstream outages into degraded modes users barely notice.

Every LLM provider has status-page afternoons, and a single-provider architecture inherits all of them as its own outages. The federation layer in Crowkis registers multiple backends, hosted rivals, your local vLLM, whatever, with health tracking, and routes around the unhealthy automatically. Provider incident becomes routing event.

The cache changes the math of degradation in a way pure gateways can't: during an outage, your entire warm corpus keeps serving at full speed regardless of upstream health. The repeated head of traffic, the majority, in mature deployments, doesn't even notice the incident. Only novel queries feel the fallback, and they get the backup backend instead of an error.

model upgrades without the cold start

1
gpt-4o cache · warm
2
canary: slice of traffic · on the new model
3
quality holds?
4
migrate entries with leasing
5
new model · cache still warm
6
stay, nothing lost

The upgrade is a workflow, not a leap of faith.

Cross-provider consistency comes from the same machinery as everywhere else: fallback answers face the same write gates, and Enterprise's cache bridge means answers banked from the primary serve equivalent queries during the fallback window, no split-brain corpus, no quality whiplash.

The bottom line

Resilience reviews usually price redundancy in standby compute. A warm semantic cache is the cheapest redundancy you'll ever buy: it's the only failover asset that was already paying for itself before the incident.