Developer Benchmarks
Source-tied evaluation against the active MARCUS library
The dashboard centers the frozen-core regression suite, keeps generated-suite reporting separate, and exposes the source-review, grounding, safety, and latency traces that back each run.
Loading benchmark history…