harness-arena
Live arena ranking open-source AI coding-agent harnesses on SWE-bench Verified, GitHub repo health, and release cadence. The harness, not the model, is the scaffold that turns raw weights into a working coder. This is the arena that measures the scaffold.
Public dashboard with a sortable leaderboard, per-harness scorecards, a diff feed of new submissions and releases, an embeddable SVG badge, and a methodology page that lists every upstream URL and its last fetch time.
Quick start
npm install
node server.js
# → http://0.0.0.0:4747/harness-arena/
The first refresh-repos cron runs ~8 seconds after boot. The leaderboard fills in as data lands; cells that haven't been fetched yet show an em-dash.
Stack
- Node.js 20 LTS
- Express 4 + helmet + compression + express-rate-limit
- better-sqlite3 (WAL mode) at
data/harness-arena.db - node-cron
- Built-in
fetch(Node 20+) js-yamlfor SWE-benchmetadata.yaml- Vanilla-JS SPA frontend, Chart.js loaded from
cdn.jsdelivr.net
Data sources (all public, all real, all runtime-fetched)
| Source | URL | Frequency |
|---|---|---|
| GitHub repo metadata | https://api.github.com/repos/{owner}/{repo} | every 15 min |
| GitHub commits (30 d) | https://api.github.com/repos/{owner}/{repo}/commits?since={iso} | every 60 min |
| GitHub releases | https://api.github.com/repos/{owner}/{repo}/releases/latest | every 6 hr |
| SWE-bench Verified submission index | https://api.github.com/repos/swe-bench/experiments/contents/evaluation/verified | every 30 min |
| SWE-bench Verified metadata | https://raw.githubusercontent.com/swe-bench/experiments/main/evaluation/verified/{dir}/metadata.yaml | on-demand |
| SWE-bench Verified results | https://raw.githubusercontent.com/swe-bench/experiments/main/evaluation/verified/{dir}/results/results.json | on-demand |
Zero seed/mock/random data anywhere. Cells without a fetched value render as em-dashes.
Endpoints (all public, no auth)
GET /harness-arena/— SPA shell.GET /harness-arena/api/leaderboard— ranked rows.GET /harness-arena/api/harness/:slug— full per-harness payload.GET /harness-arena/api/feed— diff feed.GET /harness-arena/api/compare?a=:slug&b=:slug— side-by-side.GET /harness-arena/api/methodology— sources, last-fetched timestamps, counts.GET /harness-arena/api/badge/:slug.svg— embeddable SVG badge.GET /harness-arena/health—{ ok, harnesses, last_refresh, sources }.
Environment
| Variable | Default | Notes |
|---|---|---|
| PORT | 4747 | listening port |
| BASE_PATH | /harness-arena | reverse-proxy mount point |
| GITHUB_TOKEN | — | optional; lifts the GitHub rate limit from 60/h to 5000/h |
.env.example ships with __INJECT_FROM_VAULT__ placeholders that the deploy orchestrator replaces from RNDLAB key vault.
Tracked harnesses (v1)
OpenHands, Aider, Cline, SWE-agent, Roo Code, Continue, RA.Aid, live-SWE-agent, Moatless, Codex CLI, Claude Code, AugmentCode CLI, Factory Droid, pi-mono.
License
Proprietary — internal Cowork R&D output.