← back to gallery

Solve-Tax Leaderboard

Cost per SWE-bench Verified solve — live API pricing × real benchmark scores.

dev-toolsaicoding-agentsbenchmarkpricingleaderboardswe-benchopenrouter
Open product ↗

solve-tax

The live $-per-SWE-bench-solve leaderboard. Real benchmark scores joined with current API pricing — sorted by the dollar cost to ship one resolved ticket.

What this is

When buyers shop for a coding agent they usually see two separate numbers:

Neither answers the buyer's real question: "how much do I burn per ticket if I throw 1,000 of them at this agent?"

solve-tax joins the two halves into a single, sortable, live leaderboard. Lower $/solve = better.

Data sources (all live, all public)

| Source | URL | Refresh |
|---|---|---|
| SWE-bench Verified submissions | github.com/SWE-bench/experiments/evaluation/verified (metadata.yaml + results/results.json per submission) | every 6 hours |
| API pricing | openrouter.ai/api/v1/models (~300 models, raw $/token) | every 2 hours |
| Daily $/solve snapshot | local DB rollup | once a day |

No hardcoded scores. No mock pricing. No Math.random() jitter. If a submission has no results.json yet, it shows up with 0 resolved and is parked at the bottom of the table until SWE-bench publishes its score.

Formula

cost_per_attempt = in_tok × $/in + out_tok × $/out          # $ / single try
cost_per_solve   = cost_per_attempt × 500 / resolved        # 500 = SWE-bench Verified instance count

Token budget defaults to 250,000 input + 30,000 output tokens per attempt — the rough average observed across published agent traces on SWE-bench Verified. Both knobs are user-adjustable at the top of the page and via ?in_tok= / ?out_tok= on the API.

Endpoints

All endpoints are public (no auth) and mount under BASE_PATH=/solve-tax.

GET /solve-tax/health
GET /solve-tax/api/leaderboard?in_tok=&out_tok=&bash_only=&os_model=&min_resolve=&org=&limit=
GET /solve-tax/api/submissions
GET /solve-tax/api/submissions/:dir
GET /solve-tax/api/models
GET /solve-tax/api/models/:id
GET /solve-tax/api/repos
GET /solve-tax/api/repos/:repo
GET /solve-tax/api/biggest-drops
GET /solve-tax/api/recompute?in_tok=&out_tok=
GET /solve-tax/api/refresh-log?limit=
GET /solve-tax/api/stats
GET /solve-tax/api/refresh/:source         # source: swebench | openrouter | rematch | snapshot
GET /solve-tax/                            # SPA

Running locally

npm install
PORT=4841 BASE_PATH=/solve-tax node server.js
# then open http://localhost:4841/solve-tax/

First boot fetches pricing immediately and kicks off a SWE-bench sweep in the background; the leaderboard fills in within a few minutes. Set SKIP_BOOT_SWEEP=1 to disable that behaviour while developing offline.

Stack

Caveats

License

MIT