Contributing

How to contribute tasks, workloads, and code to AgenticCodingBench.

Development Setup

Clone the repo and install in editable mode with dev dependencies:

git clone https://github.com/swarmone/agentic-coding-bench.git

cd agentic-coding-bench

pip install -e ".[dev,proxy]"

Development Commands

Command	Description
make test	Run the full test suite
make lint	Check code style (ruff, mypy)
make format	Auto-format code (ruff format)

Adding Tasks

Tasks are defined in agentic_coding_bench/tasks/tasks.json. Each task has:

tasks.json (single entry)

{
  "id": "P111",
  "tier": "medium",
  "tier_name": "3 - Medium",
  "prompt": "Build a REST API endpoint that...",
  "tags": ["python", "api", "fastapi"],
  "max_output_tokens": 2048
}

Field	Description
id	Unique ID (P1 through P110+)
tier	Difficulty: trivial, easy, medium, hard, expert
tier_name	Display name with number prefix
prompt	The agentic coding task description
tags	Categorization tags (language, domain)
max_output_tokens	Token limit for the response

Adding Workloads

Record a real session and contribute it as a built-in workload:

1Record a session with acb record
2Place the JSONL file in agentic_coding_bench/workloads/data/
3Register it in workloads/registry.py
4Open a PR with a description of the session and what it tests

Project Architecture

Project structure

agentic-coding-bench/
  agentic_coding_bench/
    cli.py              # Click CLI (acb speed | eval | agent | ...)
    config.py           # Config: CLI > env > YAML > defaults
    tasks/
      tasks.json        # 110 agentic coding tasks
      registry.py       # Load/filter tasks
      context/          # Agentic session context generation
    runner/
      direct.py         # Speed mode: direct endpoint benchmark
      eval_runner.py    # Eval mode: code correctness
      claude_code.py    # Agent mode: Claude Code orchestration
    workloads/
      recorder.py       # Recording proxy
      player.py         # Replay engine
      registry.py       # Load/list workloads
      data/             # Built-in workload files
    proxy/
      server.py         # Agent-mode proxy (FastAPI)
      translators.py    # API format translation
    metrics/
      collector.py      # Per-request metrics collection
      stats.py          # Statistical analysis
    report/
      markdown.py       # Report generation

PR Guidelines

Run make test and make lint before submitting
Add tests for new features
Keep commits focused - one feature or fix per PR
Update documentation if you change CLI flags or behavior

License

AgenticCodingBench is released under the Apache 2.0 license. See LICENSE for details.