Provider Benchmarks
Real-world agentic inference benchmarks. Recorded coding sessions replayed against provider endpoints - measuring what matters as context grows beyond 100K tokens.
Methodology
All benchmarks use asb replay mode - a recorded agentic coding session replayed against each provider endpoint. Cost is derived from per-provider pricing applied to actual token usage.