Skip to main content
Developer Benchmarks

Source-tied evaluation against the active MARCUS library

The dashboard centers the frozen-core regression suite, keeps generated-suite reporting separate, and exposes the source-review, grounding, safety, and latency traces that back each run.

Loading benchmark history…