agent-uplift · harness uplift leaderboard

—

Submissions indexed

—

Models tracked

—

Harnesses tracked

—
Mean uplift (pp)

—
Max uplift (pp)

What is harness uplift?

Two submissions with the same LLM can score 5 to 30 points apart on SWE-bench Verified depending only on the harness around the model — the tool loop, the context-management policy, the planner. agent-uplift surfaces that delta for every frontier model. Pick a model, see the bare-baseline score, the best-known ceiling, and the harness that gets you there.

All data is pulled hourly from swe-bench/experiments. No mock data. No seeded fallback. Methodology.

Per-model uplift leaderboard

Sort loading…

Model	Bare baseline	Best ceiling	Best harness	Uplift	Submissions	Share
loading…

Harness impact

mean uplift each harness has delivered

Harness	Mean uplift delivered	Ceilings held	Models touched	Submissions	GitHub
loading…

What is harness uplift?

Per-model uplift leaderboard

Harness impact

Movers — last 14 days