Skip to content

Scenarios

Runnable, end-to-end examples that map to real adoption questions.

Scenario When to read
A simple end-to-end suite First real example — the bundled quickstart, four cases.
Large suites & multi-file agents Outgrowing one suite file; per-domain organization.
Edge cases Empty output, exceptions, missing baselines, exotic inputs.
CI/CD integration GitHub Actions, GitLab, CircleCI, Buildkite.
OpenAI / Anthropic SDK adapters Skip manual instrumentation when on a supported SDK.
Performance & cost budgets cost_lt_usd, latency_lt_ms, drift detection.
Debugging workflow A failing case → root cause in five minutes.
Failure handling Exception paths, judge unavailability, baseline corruption.

Every scenario follows the same five-section shape: problem → input → code → output → explanation. Copy-paste any of them as a starting point.