Skip to main content
Developer Benchmarks

Grounded evaluation against the active MARCUS corpus

The dashboard centers the frozen-core regression suite, keeps generated-suite reporting separate, and exposes the retrieval, grounding, safety, and latency traces that back each run.

Loading benchmark history…