—
Submissions indexed
—
Models tracked
—
Harnesses tracked
—
Mean uplift (pp)
—
Max uplift (pp)
What is harness uplift?
Two submissions with the same LLM can score 5 to 30 points apart on SWE-bench Verified depending only on the harness around the model — the tool loop, the context-management policy, the planner. agent-uplift surfaces that delta for every frontier model. Pick a model, see the bare-baseline score, the best-known ceiling, and the harness that gets you there.
All data is pulled hourly from swe-bench/experiments. No mock data. No seeded fallback. Methodology.
Per-model uplift leaderboard
loading…
| Model | Bare baseline | Best ceiling | Best harness | Uplift | Submissions | Share |
|---|---|---|---|---|---|---|
| loading… | ||||||
Harness impact
mean uplift each harness has delivered| Harness | Mean uplift delivered | Ceilings held | Models touched | Submissions | GitHub |
|---|---|---|---|---|---|
| loading… | |||||