$ Leaderboard

Real benchmark results from the AgenticCodingBench suite. Every row is a reproducible configuration tested under agentic coding workloads - TTFT, throughput, and latency from 6K to 400K token contexts.

10 entries
#
🥇vLLM on 1×A100 80GB - Llama 3.1 8BvLLMLlama 3.1 8B1×A100 80GB95ms620ms95🟢 GOOD2026-03-15
🥈vLLM on 1×A100 80GB - Llama 3.1 8B (8 users)vLLMLlama 3.1 8B1×A100 80GB320ms2.8s28🟡 MARGINAL2026-03-15
🥉SGLang on 1×H100 - Llama 3.1 70BSGLangLlama 3.1 70B1×H100 80GB180ms1.1s48🟢 GOOD2026-03-20
4SGLang on 1×H100 - Llama 3.1 70B (8 users)SGLangLlama 3.1 70B1×H100 80GB650ms4.8s14🟡 MARGINAL2026-03-20
5TGI on 1×A100 80GB - Llama 3.1 8BTGILlama 3.1 8B1×A100 80GB110ms750ms85🟢 GOOD2026-03-18
6Together AI API - Llama 3.1 70BAPILlama 3.1 70BManaged250ms1.4s42🟢 GOOD2026-03-22
7Fireworks API - Llama 3.1 70BAPILlama 3.1 70BManaged210ms1.2s48🟢 GOOD2026-03-25
8vLLM on 1×A100 80GB - Llama 3.1 8B (32 users)vLLMLlama 3.1 8B1×A100 80GB1.2s12.0s6🔴 POOR2026-03-28
9vLLM on 2×H100 - DeepSeek R1 (Reasoning)vLLMDeepSeek R12×H100 80GB450ms2.8s32🟡 MARGINAL2026-04-01
10SGLang on 1×H100 - Llama 3.1 70B (Cache test)SGLangLlama 3.1 70B1×H100 80GB185ms1.1s50🟢 GOOD2026-04-05

Submit Your Results

Got a serving stack, model, or hardware combo that's not listed? Run the benchmark and submit your results via pull request. Every entry requires a reproducible configuration and raw metrics.

1. Install: pip install agentic-coding-bench

2. Run: acb speed --endpoint YOUR_ENDPOINT --model YOUR_MODEL --suite full

3. Submit the JSON report as a PR to data/leaderboard/