Contributing

How to contribute tasks, workloads, and code to AgenticCodingBench.

Development Setup

Clone the repo and install in editable mode with dev dependencies:

git clone https://github.com/swarmone/agentic-coding-bench.git
cd agentic-coding-bench
pip install -e ".[dev,proxy]"

Development Commands

CommandDescription
make testRun the full test suite
make lintCheck code style (ruff, mypy)
make formatAuto-format code (ruff format)

Adding Tasks

Tasks are defined in agentic_coding_bench/tasks/tasks.json. Each task has:

tasks.json (single entry)
{
  "id": "P111",
  "tier": "medium",
  "tier_name": "3 - Medium",
  "prompt": "Build a REST API endpoint that...",
  "tags": ["python", "api", "fastapi"],
  "max_output_tokens": 2048
}
FieldDescription
idUnique ID (P1 through P110+)
tierDifficulty: trivial, easy, medium, hard, expert
tier_nameDisplay name with number prefix
promptThe agentic coding task description
tagsCategorization tags (language, domain)
max_output_tokensToken limit for the response

Adding Workloads

Record a real session and contribute it as a built-in workload:

  1. 1Record a session with acb record
  2. 2Place the JSONL file in agentic_coding_bench/workloads/data/
  3. 3Register it in workloads/registry.py
  4. 4Open a PR with a description of the session and what it tests

Project Architecture

Project structure
agentic-coding-bench/
  agentic_coding_bench/
    cli.py              # Click CLI (acb speed | eval | agent | ...)
    config.py           # Config: CLI > env > YAML > defaults
    tasks/
      tasks.json        # 110 agentic coding tasks
      registry.py       # Load/filter tasks
      context/          # Agentic session context generation
    runner/
      direct.py         # Speed mode: direct endpoint benchmark
      eval_runner.py    # Eval mode: code correctness
      claude_code.py    # Agent mode: Claude Code orchestration
    workloads/
      recorder.py       # Recording proxy
      player.py         # Replay engine
      registry.py       # Load/list workloads
      data/             # Built-in workload files
    proxy/
      server.py         # Agent-mode proxy (FastAPI)
      translators.py    # API format translation
    metrics/
      collector.py      # Per-request metrics collection
      stats.py          # Statistical analysis
    report/
      markdown.py       # Report generation

PR Guidelines

  • Run make test and make lint before submitting
  • Add tests for new features
  • Keep commits focused - one feature or fix per PR
  • Update documentation if you change CLI flags or behavior

License

AgenticCodingBench is released under the Apache 2.0 license. See LICENSE for details.