How $/solve is computed
For every submission on the public SWE-bench Verified leaderboard, we read the resolved-instance count from SWE-bench/experiments/evaluation/verified/<dir>/results/results.json. SWE-bench Verified contains exactly 500 human-validated instances, so the resolve rate is resolved / 500.
The submission's tags.model field is fuzzy-matched against an OpenRouter model id (cache-aware, with manual overrides in config/aliases.json). When a match exists, we look up the model's current $/token from OpenRouter's public models endpoint.
The cost per attempt assumes a fixed per-attempt budget — defaulted to 250,000 input tokens and 30,000 output tokens, the rough average reported by published agent traces on SWE-bench Verified runs. Both knobs are user-adjustable at the top of the leaderboard tab; the table re-renders with your numbers.
Then:
cost_per_attempt = in_tok × $/in + out_tok × $/out cost_per_solve = cost_per_attempt × 500 / resolved
Lower is better. Submissions that resolved zero instances or whose model couldn't be matched to a current OpenRouter price are listed at the bottom of the table.
Data sources
- SWE-bench scores:
github.com/SWE-bench/experiments(refreshed every 6h) - API pricing:
openrouter.ai/api/v1/models(refreshed every 2h) - Daily $/solve snapshot for trend lines
What this is not
It's an estimate, not a per-call meter. Real costs depend on prompt design, scaffold, cache-hit rate, retry policy, and the prevailing OpenRouter route. The formula is exposed so anyone can plug their own numbers in.