CLI Reference¶
agentprdiff is a Click app with six subcommands. Run
agentprdiff --help or agentprdiff <cmd> --help for the canonical
text.
Top-level options¶
| Option | Default | Description |
|---|---|---|
--root |
.agentprdiff |
Directory where baselines and runs live. |
--version |
— | Print the installed version (agentprdiff, version 0.2.3) and exit. |
--help |
— | Show help and exit. |
agentprdiff init¶
Create the .agentprdiff/ directory and a starter .gitignore.
agentprdiff init
# initialized .agentprdiff/
# baselines: .agentprdiff/baselines/ (commit this)
# runs: .agentprdiff/runs/ (gitignored)
Idempotent. Running it twice does nothing the second time.
agentprdiff record SUITE_FILE¶
Run every suite in SUITE_FILE and save each trace as the canonical
baseline.
| Option | Description |
|---|---|
--json-out PATH |
Also write a JSON report to PATH. Overwrites every run. |
--case PATTERN |
Only record cases matching PATTERN. Repeatable; comma-split. |
--skip PATTERN |
Skip cases matching PATTERN. Same syntax. |
--list |
Print suite/case names and exit without running. |
agentprdiff record suites/billing.py
agentprdiff record suites/*.py --json-out artifacts/agentprdiff.json
agentprdiff record suites/billing.py --case refund_happy_path
Exit codes. 0 on success. 1 when any case raised an exception
(grader failures alone don't fail record). 2 when --case /
--skip matched zero cases.
Side effects. Overwrites .agentprdiff/baselines/<suite>/<case>.json
in place. Re-running record with the same suite shows up as a regular
git diff in the next PR.
agentprdiff check SUITE_FILE¶
Run every suite and diff against saved baselines. The CI command.
| Option | Description |
|---|---|
--json-out PATH |
Write a JSON report to PATH. Overwrites every run. |
--case PATTERN |
Only check matching cases. |
--skip PATTERN |
Skip matching cases. |
--list |
Print case names and exit. |
--fail-on/--no-fail-on |
When --no-fail-on, regressions are reported but the exit code stays 0. Default --fail-on. |
agentprdiff check suites/*.py
agentprdiff check suites/billing.py --case "*refund*" --json-out art/check.json
Exit codes. 0 on no regression. 1 on any regression (with
--fail-on). 2 on filter-matched-zero-cases.
Side effects. Writes runs/<timestamp>/<suite>/<case>.json per case
(gitignored). Each check invocation creates a fresh timestamped
directory; rm -rf .agentprdiff/runs/ is safe any time.
agentprdiff review SUITE_FILE¶
Verbose per-case panels. Always exits 0. The local-iteration
counterpart to check — pytest -k for agents.
| Option | Description |
|---|---|
--case PATTERN |
Only render matching cases. |
--skip PATTERN |
Skip matching cases. |
--list |
Print case names and exit. |
review runs the same comparison as check (and writes to the same
runs/ dir) but renders one verbose panel per case — input echo, every
assertion's was → now verdict, cost / latency / token deltas,
tool-sequence diff, unified output diff. Always exits 0 so it slots
into watcher loops without going red between every keystroke.
agentprdiff diff SUITE_NAME CASE_NAME¶
Print the saved baseline trace for a single case as pretty JSON.
Exit codes. 0 on success. 2 when no baseline exists.
agentprdiff scaffold NAME¶
Stamp out the canonical suite layout for a new adoption.
| Option | Default | Description |
|---|---|---|
--recipe |
sync-openai |
One of sync-openai, async-openai, stubbed. |
--dir PATH |
. |
Project root to scaffold into. |
Writes:
suites/__init__.py
suites/_eval_agent.py # recipe-specific
suites/_stubs.py
suites/<NAME>.py
suites/<NAME>_cases.md # reviewer-facing dossier
suites/README.md
.github/workflows/agentprdiff.yml
Never overwrites. Existing files are reported as [skip]. Files
that get written are reported as [new]. Exit code 2 on bad name or
unknown recipe.
Recipe selection¶
| Recipe | When to use |
|---|---|
sync-openai |
Agent uses OpenAI() (or any OpenAI-compatible client) synchronously. |
async-openai |
Agent uses AsyncOpenAI and you want the asyncio.run bridge. |
stubbed |
Agent's LLM call lives behind a single helper (e.g. summarize(text)) — substitute the helper rather than instrument the SDK. |
Filter syntax¶
--case and --skip share the same parser.
| Syntax | Meaning |
|---|---|
refund_happy_path |
Case-insensitive substring. Matches refund_happy_path, Refund_happy_path, etc. |
*refund* |
Glob (fnmatch). Case-insensitive. |
refund? |
Glob with single-char wildcard. |
~slow |
Negate — same as --skip slow. |
!slow |
Same as ~slow (alternate syntax). |
billing:refund* |
Qualify by suite name. |
--case a,b |
Comma-split — equivalent to --case a --case b. |
--case a --case b |
Repeated flag. |
Negative patterns inside --case are merged with --skip and treated as
unconditional drops. A case is kept iff:
- It matches at least one positive pattern (or no positive patterns were given).
- It does not match any negative or
--skippattern.
A filter that matches zero cases exits 2 with a hint:
error: no cases matched --case/--skip filters.
available cases:
customer_support/refund_happy_path
customer_support/non_refundable_order
...
(tip: run with --list to see suite/case names; patterns are case-insensitive substrings or globs.)
Examples cookbook¶
# Discover what's in a suite
agentprdiff check suites/billing.py --list
# Run one case
agentprdiff check suites/billing.py --case refund_happy_path
# Run everything except slow cases
agentprdiff check suites/billing.py --skip slow
# Re-record one case after a deliberate behavior change
agentprdiff record suites/billing.py --case refund_happy_path
git add .agentprdiff/baselines/billing/refund_happy_path.json
# Quick local iteration without breaking your shell prompt
agentprdiff review suites/billing.py --case refund_happy_path
# Run all suites, write a JSON artifact for CI archiving
agentprdiff check suites/*.py --json-out artifacts/agentprdiff.json
# Inspect a saved baseline
agentprdiff diff billing refund_happy_path | jq '.llm_calls[].cost_usd'
# Bootstrap a new adoption
agentprdiff scaffold billing --recipe sync-openai