← back to gallery

Harness Arena

Rank open-source AI coding agent harnesses on SWE-bench Verified, repo health, and release cadence

dev-toolsai-agentsswe-benchleaderboarddeveloper-toolsopen-source
Open product ↗

harness-arena

Live arena ranking open-source AI coding-agent harnesses on SWE-bench Verified, GitHub repo health, and release cadence. The harness, not the model, is the scaffold that turns raw weights into a working coder. This is the arena that measures the scaffold.

Public dashboard with a sortable leaderboard, per-harness scorecards, a diff feed of new submissions and releases, an embeddable SVG badge, and a methodology page that lists every upstream URL and its last fetch time.

Quick start

npm install
node server.js
# → http://0.0.0.0:4747/harness-arena/

The first refresh-repos cron runs ~8 seconds after boot. The leaderboard fills in as data lands; cells that haven't been fetched yet show an em-dash.

Stack

Data sources (all public, all real, all runtime-fetched)

| Source | URL | Frequency |
|---|---|---|
| GitHub repo metadata | https://api.github.com/repos/{owner}/{repo} | every 15 min |
| GitHub commits (30 d) | https://api.github.com/repos/{owner}/{repo}/commits?since={iso} | every 60 min |
| GitHub releases | https://api.github.com/repos/{owner}/{repo}/releases/latest | every 6 hr |
| SWE-bench Verified submission index | https://api.github.com/repos/swe-bench/experiments/contents/evaluation/verified | every 30 min |
| SWE-bench Verified metadata | https://raw.githubusercontent.com/swe-bench/experiments/main/evaluation/verified/{dir}/metadata.yaml | on-demand |
| SWE-bench Verified results | https://raw.githubusercontent.com/swe-bench/experiments/main/evaluation/verified/{dir}/results/results.json | on-demand |

Zero seed/mock/random data anywhere. Cells without a fetched value render as em-dashes.

Endpoints (all public, no auth)

Environment

| Variable | Default | Notes |
|---|---|---|
| PORT | 4747 | listening port |
| BASE_PATH | /harness-arena | reverse-proxy mount point |
| GITHUB_TOKEN | — | optional; lifts the GitHub rate limit from 60/h to 5000/h |

.env.example ships with __INJECT_FROM_VAULT__ placeholders that the deploy orchestrator replaces from RNDLAB key vault.

Tracked harnesses (v1)

OpenHands, Aider, Cline, SWE-agent, Roo Code, Continue, RA.Aid, live-SWE-agent, Moatless, Codex CLI, Claude Code, AugmentCode CLI, Factory Droid, pi-mono.

License

Proprietary — internal Cowork R&D output.