← back to gallery

Agent Uplift

How much free score does the right harness give your model on SWE-bench?

dev-toolsswe-benchai-coding-agentsharnessleaderboardbenchmark
Open product ↗

agent-uplift

The harness uplift leaderboard. For every frontier LLM, the score boost from picking the best open-source agent harness on SWE-bench Verified, Lite, and Multi-SWE-bench. Real public submissions, refreshed hourly, zero auth.

Live: https://holyai.me/agent-uplift/

Why this exists

In May 2026 the AI coding agent community had a quiet realization: the harness matters more than the weights. Take the same model, wrap it in swe-agent versus aider versus cline versus codex-cli, and the SWE-bench Verified score can move by 5 to 30 points. The official SWE-bench leaderboard lists each submission once, in submission order — it does not group by model, it does not surface the delta.

agent-uplift answers a single question per model:

If I run this model, which harness gives me the biggest free score boost, and by how many points?

What you see

Data sources (NO MOCKS)

Every datapoint is pulled at runtime from a real, public, no-auth HTTP endpoint:

| Source | URL pattern | Refresh |
| --- | --- | --- |
| swe-bench/experiments — Verified | https://api.github.com/repos/swe-bench/experiments/contents/evaluation/verified | every 60 min |
| swe-bench/experiments — Lite | https://api.github.com/repos/swe-bench/experiments/contents/evaluation/lite | every 60 min |
| per-submission metadata | https://raw.githubusercontent.com/swe-bench/experiments/main/evaluation/<bench>/<dir>/metadata.yaml | on directory diff |
| per-submission results | https://raw.githubusercontent.com/swe-bench/experiments/main/evaluation/<bench>/<dir>/results/results.json | on directory diff |
| multi-swe-bench results | https://api.github.com/repos/multi-swe-bench/multi-swe-bench/contents/results | every 60 min |
| harness repo metadata | https://api.github.com/repos/<owner>/<repo> | every 6 hours |

Every response is ETag-cached in SQLite (http_cache table) so the unauthenticated 60 req/hour rate limit is plenty. If a GITHUB_TOKEN env var is set, the limit rises to 5000/hour.

Methodology

Tech

Running locally

npm install
PORT=4896 node server.js
# open http://localhost:4896/agent-uplift/

Endpoints

GET  /agent-uplift/                      SPA shell
GET  /agent-uplift/health                liveness + cron health JSON
GET  /agent-uplift/api/uplift            per-model uplift leaderboard
GET  /agent-uplift/api/model/:model      per-model detail + raw submission links
GET  /agent-uplift/api/harness/:harness  per-harness view
GET  /agent-uplift/api/harnesses         harness impact table
GET  /agent-uplift/api/movers            new submissions in window
GET  /agent-uplift/api/stats             aggregate stats
GET  /agent-uplift/api/methodology       methodology JSON
GET  /agent-uplift/methodology           methodology HTML
GET  /agent-uplift/share/:model.svg      1200×630 OG share card
POST /agent-uplift/api/refresh           trigger an out-of-cycle refresh (rate-limited 1/min/IP)

License

MIT. Part of the Holy AI / Cowork R&D fleet.