Contributing
How to contribute tasks, workloads, and code to AgenticCodingBench.
Development Setup
Clone the repo and install in editable mode with dev dependencies:
git clone https://github.com/swarmone/agentic-coding-bench.gitcd agentic-coding-benchpip install -e ".[dev,proxy]"Development Commands
| Command | Description |
|---|---|
| make test | Run the full test suite |
| make lint | Check code style (ruff, mypy) |
| make format | Auto-format code (ruff format) |
Adding Tasks
Tasks are defined in agentic_coding_bench/tasks/tasks.json. Each task has:
tasks.json (single entry)
{
"id": "P111",
"tier": "medium",
"tier_name": "3 - Medium",
"prompt": "Build a REST API endpoint that...",
"tags": ["python", "api", "fastapi"],
"max_output_tokens": 2048
}| Field | Description |
|---|---|
| id | Unique ID (P1 through P110+) |
| tier | Difficulty: trivial, easy, medium, hard, expert |
| tier_name | Display name with number prefix |
| prompt | The agentic coding task description |
| tags | Categorization tags (language, domain) |
| max_output_tokens | Token limit for the response |
Adding Workloads
Record a real session and contribute it as a built-in workload:
- 1Record a session with acb record
- 2Place the JSONL file in agentic_coding_bench/workloads/data/
- 3Register it in workloads/registry.py
- 4Open a PR with a description of the session and what it tests
Project Architecture
Project structure
agentic-coding-bench/
agentic_coding_bench/
cli.py # Click CLI (acb speed | eval | agent | ...)
config.py # Config: CLI > env > YAML > defaults
tasks/
tasks.json # 110 agentic coding tasks
registry.py # Load/filter tasks
context/ # Agentic session context generation
runner/
direct.py # Speed mode: direct endpoint benchmark
eval_runner.py # Eval mode: code correctness
claude_code.py # Agent mode: Claude Code orchestration
workloads/
recorder.py # Recording proxy
player.py # Replay engine
registry.py # Load/list workloads
data/ # Built-in workload files
proxy/
server.py # Agent-mode proxy (FastAPI)
translators.py # API format translation
metrics/
collector.py # Per-request metrics collection
stats.py # Statistical analysis
report/
markdown.py # Report generationPR Guidelines
- Run make test and make lint before submitting
- Add tests for new features
- Keep commits focused - one feature or fix per PR
- Update documentation if you change CLI flags or behavior
License
AgenticCodingBench is released under the Apache 2.0 license. See LICENSE for details.