Context Profiles
Understanding the 7 context profiles, cache defeat, and how to control context size.
Overview
Context size simulates where you are in a real coding session. When Claude Code opens a file, reads 2,000 lines, edits functions, runs tests, and reads errors - that's 5+ LLM round-trips with 40–100K token contexts growing each turn.
Each profile pads requests with realistic agentic coding content: system prompts with tool schemas, prior conversation turns, file contents, tool call results, and error traces.
The 7 Profiles
| Profile | Tokens | What it simulates |
|---|---|---|
| fresh | ~6K | Just opened the project |
| short | ~20K | A few turns in |
| medium | ~40K | Active coding session |
| long | ~70K | Deep multi-file work |
| full | ~100K | Extended session |
| xl | ~200K | Very large context |
| xxl | ~400K | Maximum context window |
The default realistic suite sweeps fresh → short → medium → long → full, simulating a full session lifecycle.
Usage
Simulate a deep coding session (70K context)
acb speed -e URL -m MODEL --context-profile longLong-context models: test at 200K or 400K
acb speed -e URL -m MODEL --context-profile xlExact token count
acb speed -e URL -m MODEL --context-tokens 50000Default sweep (fresh → full)
acb speed -e URL -m MODELModel Context Window
Use --model-context-length to tell the benchmark your model's maximum context window. Any profiles that exceed it are automatically skipped.
# Model supports up to 128K - xl and xxl are skipped
acb speed -e URL -m MODEL --suite full --model-context-length 128000# Model supports 400K - run everything including xxl
acb speed -e URL -m MODEL --context-profile xxl --model-context-length 400000Prefix Cache Defeat
Every request includes a unique salt to ensure prefix caching cannot mask cold-start prefill costs:
[session_id=abc123... ts=1234567890 rand=847291...]This guarantees every measurement reflects true inference performance.
Cache Mode Options
Default: cache defeat enabled (cold-start measurement)
acb speed -e URL -m MODEL --defeat-cacheMeasure production-like performance with caching
acb speed -e URL -m MODEL --allow-cacheMeasure BOTH - shows exact cache speedup
acb speed -e URL -m MODEL --cache-mode both--cache-mode both runs each scenario twice (first cold, then warm) and reports the delta. Anthropic charges 10× less for cached tokens ($0.30 vs $3.00/M), so knowing your cache hit rate matters.