How agent-uplift computes harness uplift
- Baseline per model = lowest-scoring submission whose harness is tagged as minimal scaffold (
agentless, minimal, bare, vanilla, oneshot).
- If no minimal-scaffold submission exists for that model, baseline falls back to the lowest-scoring submission across all harnesses for that model. The UI labels this case explicitly.
- Ceiling per model = highest-scoring submission across all harnesses for that model.
- Uplift = ceiling − baseline.
- Every cron pass refreshes the directory listing of
swe-bench/experiments
and pulls each submission's
metadata.yaml + results.json via the raw GitHub content endpoint.
- No mock data. No seeded fallback. If a source 404s, the row is marked stale and not invented.
← back to the leaderboard