$ Leaderboard

Real benchmark results from the AgenticCodingBench suite. Every row is a reproducible configuration tested under agentic coding workloads - TTFT, throughput, and latency from 6K to 400K token contexts.

10 entries

#
🥇	vLLM on 1×A100 80GB - Llama 3.1 8B	vLLM	Llama 3.1 8B	1×A100 80GB	95ms	620ms	95	🟢 GOOD	2026-03-15
🥈	vLLM on 1×A100 80GB - Llama 3.1 8B (8 users)	vLLM	Llama 3.1 8B	1×A100 80GB	320ms	2.8s	28	🟡 MARGINAL	2026-03-15
🥉	SGLang on 1×H100 - Llama 3.1 70B	SGLang	Llama 3.1 70B	1×H100 80GB	180ms	1.1s	48	🟢 GOOD	2026-03-20
4	SGLang on 1×H100 - Llama 3.1 70B (8 users)	SGLang	Llama 3.1 70B	1×H100 80GB	650ms	4.8s	14	🟡 MARGINAL	2026-03-20
5	TGI on 1×A100 80GB - Llama 3.1 8B	TGI	Llama 3.1 8B	1×A100 80GB	110ms	750ms	85	🟢 GOOD	2026-03-18
6	Together AI API - Llama 3.1 70B	API	Llama 3.1 70B	Managed	250ms	1.4s	42	🟢 GOOD	2026-03-22
7	Fireworks API - Llama 3.1 70B	API	Llama 3.1 70B	Managed	210ms	1.2s	48	🟢 GOOD	2026-03-25
8	vLLM on 1×A100 80GB - Llama 3.1 8B (32 users)	vLLM	Llama 3.1 8B	1×A100 80GB	1.2s	12.0s	6	🔴 POOR	2026-03-28
9	vLLM on 2×H100 - DeepSeek R1 (Reasoning)	vLLM	DeepSeek R1	2×H100 80GB	450ms	2.8s	32	🟡 MARGINAL	2026-04-01
10	SGLang on 1×H100 - Llama 3.1 70B (Cache test)	SGLang	Llama 3.1 70B	1×H100 80GB	185ms	1.1s	50	🟢 GOOD	2026-04-05

Submit Your Results

Got a serving stack, model, or hardware combo that's not listed? Run the benchmark and submit your results via pull request. Every entry requires a reproducible configuration and raw metrics.

Contribution Guide Open a PR

1. Install: pip install agentic-coding-bench

2. Run: acb speed --endpoint YOUR_ENDPOINT --model YOUR_MODEL --suite full

3. Submit the JSON report as a PR to data/leaderboard/