referenceJune 4, 2026· 3 min read

How to use CEVAL: grade output without a second model

CEVAL runs deterministic evaluators — toxicity, PII, relevance, JSON validity and more — over an input/output pair, and tracks the results on /metrics.

CEVAL scores an answer locally, no LLM-judge, no egress. Call one evaluator or run the whole suite.

crowkis cli

> CEVAL relevance "how do refunds work?" "Refunds take 5-7 business days." THRESHOLD 0.7
1) "relevance"
2) "0.86"
3) "pass"

> CEVAL SUITE "is the sky blue?" "Yes, on a clear day."
1) "non_empty pass | json_valid n/a | toxicity 0.0 pass | answered pass ..."

Per-evaluator counters show up on /metrics as crowkis_eval_*, so you can chart your toxicity or relevance rate over time.