Record & Replay

Capture real agentic coding sessions and replay them against any endpoint.

Why Record & Replay?

Synthetic benchmarks are useful, but nothing beats measuring with your actual coding sessions. Record a real session with Claude Code, Cursor, or any LLM-powered coding agent, then replay it against any endpoint to get apples-to-apples comparisons.

acb record - Capture a Session

Starts a recording proxy between your coding agent and your LLM endpoint. Every request/response pair is saved as a JSONL line.

Record with an OpenAI-compatible upstream

acb record \
  -e http://your-gpu-server:8000 \
  -m your-model

Record with Anthropic (auto-detected from URL)

acb record \
  -e https://api.anthropic.com \
  -m claude-sonnet-4-20250514 \
  -k $ANTHROPIC_API_KEY \
  --api-key-header x-api-key \
  -o my-session.jsonl

Custom output file and port

acb record \
  -e http://your-gpu-server:8000 \
  -m your-model \
  -o my-session.jsonl \
  -P 9000

Point Your Agent at the Proxy

Once the recording proxy is running, point your coding agent at it:

ANTHROPIC_BASE_URL=http://localhost:19000 claude

Stop recording with Ctrl+C when done.

Upstream Modes

The recorder supports two upstream modes:

OpenAI-compatible (default)

Translates Anthropic Messages API → OpenAI format before forwarding.

Anthropic passthrough

Forwards requests natively to Anthropic's API - no translation, full fidelity. Auto-detected when the endpoint is api.anthropic.com, or set explicitly with --upstream-api anthropic.

Both modes save the workload in OpenAI format for replay.

acb replay - Replay Against Any Endpoint

Take a recorded workload and replay it against a different endpoint, hardware, or configuration.

Replay against a new endpoint

acb replay \
  -e http://new-server:8000 \
  -m my-model \
  -w my-session.jsonl

Generate a full report

acb replay \
  -e http://new-server:8000 \
  -m my-model \
  -w my-session.jsonl \
  -o report.md

Preview without sending requests

acb replay -e URL -m MODEL -w session.jsonl --dry-run

Slicing Workloads

Real sessions grow from small contexts to large ones. --slice-tokens N replays requests from the start until cumulative prompt tokens reach N - preserving the natural context growth while capping how much you send through the endpoint.

acb replay -e URL -m MODEL -w session.jsonl --slice-tokens 1000000

Useful for targeting specific model context limits or keeping replay costs down.

Record CLI Flags

Flag	Description
-e, --endpoint	Upstream LLM endpoint URL
-m, --model	Model name
-k, --api-key	API key for the upstream endpoint
--api-key-header	Custom API key header name
-o, --output	Output JSONL file path
-P, --port	Proxy listen port (default: 19000)
--upstream-api	Force upstream API type (openai or anthropic)

Replay CLI Flags

Flag	Description
-e, --endpoint	Target endpoint URL
-m, --model	Model name
-w, --workload	JSONL workload file path
-o, --output	Report output path
--dry-run	Preview without sending requests
--slice-tokens	Stop replaying after N cumulative prompt tokens