Reports
Understanding reports, verdicts, metrics, and comparisons.
Overview
Reports are designed to answer one question fast: is this endpoint good enough for agentic coding?
Verdicts
Every report produces a verdict based on TTFT and throughput at each context size:
GOOD- Responsive editing experience. TTFT < 3s at 40K, tok/s > 30.
MARGINAL- Usable but noticeable lag. TTFT 3–6s at 40K, tok/s 15–30.
POOR- Sluggish, frustrating. TTFT > 6s at 40K, tok/s < 15.
Report Contents
Every report includes:
| Section | Description |
|---|---|
| Verdict | GOOD / MARGINAL / POOR for agentic coding, with the key numbers |
| Key Findings | Auto-generated insights: TTFT scaling, throughput range, concurrency efficiency |
| Summary Table | TTFT, tok/s, ITL at each concurrency and context size, color-coded |
| UX Mapping | Maps raw metrics to user experience ("instant response", "smooth streaming") |
| Context Scaling | ASCII chart showing how TTFT and tok/s change as context grows |
| Concurrency Scaling | Efficiency percentages at each concurrency level with grades |
| Per-profile Breakdown | Detailed numbers per context size |
| Reasoning Analysis | Thinking overhead when using reasoning models |
| Methodology | What was measured, how, and what the grade thresholds mean |
Generating Reports
Add -o to any command to generate a report file:
acb speed -e URL -m MODEL --suite full -o report.mdacb speed -e URL -m MODEL --format json -o results.jsonThe CLI also prints a final verdict line after every benchmark:
CLI output
Verdict: GOOD for agentic coding at medium contextComparing Two Runs
The compare command produces a head-to-head table, ASCII bar chart, and winner summary:
acb compare --baseline a.json --candidate b.json -o comparison.mdRules of Thumb
Reference ranges for agentic coding (your numbers will vary by hardware, model, and serving stack):
TTFT < 3s at 40K context → responsive editing
Tok/s > 30/user → code streams smoothly
TTFT < 10s at 100K context → acceptable for deep sessions
Agg tok/s scales sub-linearly with users - ~60–70% efficiency at 8× concurrency
JSON Output
Use --format json for CI/CD pipelines. The JSON contains all metrics, per-request data, and the computed verdict:
acb speed -e URL -m MODEL --format json -o results.json