Scenario 5 — Real SDKs: OpenAI, Anthropic, & Friends¶

Skip the manual Trace.record_llm_call(...) boilerplate when your agent uses one of the supported SDKs.

OpenAI (sync)¶

Problem¶

A multi-turn tool-calling agent on OpenAI(). We want every model call and every tool dispatch in the trace, automatically.

Code¶

my_agent.py

from openai import OpenAI
from agentprdiff.adapters.openai import instrument_client, instrument_tools
from agentprdiff import Trace
import json

def lookup_order(order_id: str) -> dict:
    return {"order_id": order_id, "status": "delivered", "amount_usd": 89.0}

def send_email(to: str, body: str) -> dict:
    return {"sent": True}

TOOL_MAP = {"lookup_order": lookup_order, "send_email": send_email}

TOOL_SCHEMAS = [
    {"type": "function", "function": {"name": "lookup_order", "parameters": {...}}},
    {"type": "function", "function": {"name": "send_email",   "parameters": {...}}},
]

def my_agent(query: str) -> tuple[str, Trace]:
    client = OpenAI()
    with instrument_client(client) as trace:
        tools = instrument_tools(TOOL_MAP, trace)
        messages = [{"role": "user", "content": query}]

        for _ in range(6):  # max steps
            resp = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                tools=TOOL_SCHEMAS,
            )
            msg = resp.choices[0].message
            messages.append(msg.model_dump())
            if not msg.tool_calls:
                return msg.content or "", trace
            for tc in msg.tool_calls:
                args = json.loads(tc.function.arguments or "{}")
                result = tools[tc.function.name](**args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": json.dumps(result),
                })
        return "max steps exceeded", trace

Suite¶

suite.py

from agentprdiff import case, suite
from agentprdiff.graders import contains, cost_lt_usd, latency_lt_ms, tool_called
from my_agent import my_agent

billing = suite(
    name="billing",
    agent=my_agent,
    cases=[
        case(
            name="refund_happy_path",
            input="I want a refund for order #1234",
            expect=[
                contains("refund"),
                tool_called("lookup_order"),
                cost_lt_usd(0.02),
                latency_lt_ms(15_000),
            ],
        ),
    ],
)

Output¶

OPENAI_API_KEY is set, so the agent calls the real model. The adapter records:

one LLMCall per chat.completions.create (provider, model, token counts, cost from the bundled price table, latency, output text, raw tool_calls);
one ToolCall per dispatched function (name, kwargs, return value, latency).

agentprdiff record — suite billing  (1/1 passed, 0 regressed)
…(table; cost ~0.0008, latency ~2300 ms)…

OpenAI (async)¶

my_agent.py

import asyncio
from openai import AsyncOpenAI
from agentprdiff.adapters.openai import instrument_client, instrument_tools
from agentprdiff import Trace

async def my_agent_async(query: str) -> tuple[str, Trace]:
    client = AsyncOpenAI()
    with instrument_client(client) as trace:
        tools = instrument_tools(TOOL_MAP, trace)
        resp = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": query}],
            tools=TOOL_SCHEMAS,
        )
        msg = resp.choices[0].message
        if msg.tool_calls:
            for tc in msg.tool_calls:
                # await sync tools without await; the wrapper preserves shape.
                tools[tc.function.name](**json.loads(tc.function.arguments))
        return msg.content or "", trace

def my_agent(query: str) -> tuple[str, Trace]:
    return asyncio.run(my_agent_async(query))

Why this works¶

instrument_client inspects client.chat.completions.create at entry — if it's async def, the patch is itself async def. instrument_tools mirrors per tool: async def tools come back awaitable, sync tools stay sync. The with block remains a regular with (the patch is bound to the client instance, not the event loop).

Anthropic¶

my_agent.py

from anthropic import Anthropic
from agentprdiff.adapters.anthropic import instrument_client, instrument_tools
from agentprdiff import Trace

def my_agent(query: str) -> tuple[str, Trace]:
    client = Anthropic()
    with instrument_client(client) as trace:
        tools = instrument_tools(TOOL_MAP, trace)
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=512,
            messages=[{"role": "user", "content": query}],
            tools=[{"name": "lookup_order", "input_schema": {...}}],
        )
        # Iterate content blocks; dispatch tool_use blocks via `tools[name](**input)`.
        ...
        return final_text, trace

The Anthropic adapter records:

LLMCall per messages.create (uses usage.input_tokens / output_tokens, walks content blocks for the output text and any tool_use block summaries);
ToolCall per dispatched function via the same instrument_tools shape.

OpenAI-compatible providers¶

The OpenAI adapter works with anything that speaks the OpenAI Chat Completions wire format. The provider tag is inferred from base_url:

Provider	base_url snippet	Inferred tag
OpenAI	(default)	`openai`
Groq	`api.groq.com`	`groq`
Gemini (OpenAI-compat)	`googleapis.com` / `generativelanguage`	`gemini`
OpenRouter	`openrouter.ai`	`openrouter`
Ollama	`localhost:11434`	`ollama`
Together	`together.ai`	`together`
Fireworks	`fireworks.ai`	`fireworks`
DeepInfra	`deepinfra.com`	`deepinfra`

Anything unrecognized falls through to openai-compatible. Override explicitly with instrument_client(client, provider="my-provider").

from openai import OpenAI

# Groq
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")

# Ollama (local)
client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1")

# OpenRouter
client = OpenAI(api_key=os.environ["OPENROUTER_API_KEY"], base_url="https://openrouter.ai/api/v1")

with instrument_client(client) as trace:
    ...

Cost overrides¶

Pass prices= if you're using a model not in the bundled defaults, or if you want to record your enterprise rate instead of public list pricing:

PRICES = {
    "gpt-4o":          (0.0020, 0.0080),    # negotiated rate
    "internal-fine-1": (0.0009, 0.0018),
}

with instrument_client(client, prices=PRICES) as trace:
    ...

PRICES is {model: (input_$_per_1k_tokens, output_$_per_1k_tokens)}.

What lands in the trace¶

{
  "llm_calls": [
    {
      "provider": "openai",
      "model": "gpt-4o-mini",
      "input_messages": [...],
      "output_text": "I'll process that refund for you.",
      "tool_calls": [{"id": "call_abc", "name": "lookup_order", "arguments": "{\"order_id\":\"1234\"}"}],
      "prompt_tokens": 184,
      "completion_tokens": 27,
      "cost_usd": 0.000044,
      "latency_ms": 612.3
    }
  ],
  "tool_calls": [
    {
      "name": "lookup_order",
      "arguments": {"order_id": "1234"},
      "result": {"status": "delivered", "amount_usd": 89.0},
      "latency_ms": 8.1
    }
  ]
}

Explanation¶

The adapter monkey-patches client.chat.completions.create (or client.messages.create for Anthropic) for the duration of the with block, then restores the original on exit — even if the agent raises.
The patch is scoped to the client instance. Other client instances and global SDK state are untouched.
Tool wrappers always keep their original calling convention. Sync tools stay sync, async tools stay awaitable.
cost_usd is filled from agentprdiff.adapters.pricing.DEFAULT_PRICES unless you override it. Missing models trigger one RuntimeWarning per process (loud but not spammy).