← back to gallery

SWE-bench Pulse

Live cross-variant leaderboard tracker for the de facto coding-agent benchmark

dev-toolsswe-benchleaderboardbenchmarksai-agentscoding-agentsparetooss
Open product ↗

swe-pulse

Live SWE-bench cross-variant leaderboard tracker. Real upstream data, no mocks, no auth.

SWE-bench is the de facto coding-agent benchmark in 2026. There are now six variants (Verified, Lite, Full/Test, Multimodal, Multilingual, bash-only) and the same model often ranks wildly differently on each. swe-pulse pulls the official leaderboards.json every 6 hours and surfaces:

Data source

| Source | URL | Frequency |
| --- | --- | --- |
| SWE-bench leaderboards | https://raw.githubusercontent.com/SWE-bench/swe-bench.github.io/master/data/leaderboards.json | every 6 hours (cron 0 /6 ) plus immediate bootstrap on cold start |

Every datapoint on the dashboard comes from this single upstream. No mock arrays, no Math.random jitter, no preset fallbacks. When the upstream is unreachable the failure is recorded in the snapshots table and the dashboard keeps showing the previous valid snapshot instead of inventing data.

Endpoints

All under BASE_PATH (default /swe-pulse), all public, all read-only except /api/refresh which is public but rate-limited to one call per minute.

| Method | Path | Purpose |
| --- | --- | --- |
| GET | /health | service health + latest snapshot age |
| GET | /api/leaderboards | summary card data for all variants |
| GET | /api/leaderboard/:name?oss=1&limit=N | full entries for one leaderboard |
| GET | /api/cross | model-family × leaderboard matrix |
| GET | /api/pareto?lb=Verified | cost-vs-score Pareto frontier |
| GET | /api/oss-gap?lb=Verified | best proprietary vs best OSS comparison |
| GET | /api/orgs | aggregate by submitting org |
| GET | /api/velocity | submissions per month, stacked by leaderboard |
| GET | /api/movement?lb=Verified&top=20 | 24h rank deltas |
| GET | /api/snapshots | last 20 fetch results (success and failures) |
| POST | /api/refresh | force an upstream re-pull (rate-limited 1/min) |

No auth

Every endpoint — reads and the refresh trigger — is public. There is no ADMIN_PASS, no Basic Auth, no admin login page. That is intentional.

Stack

Run locally

npm install
PORT=4776 BASE_PATH=/swe-pulse node server.js
# → http://localhost:4776/swe-pulse/

Environment variables (see .env.example):

Layout

swe-pulse/
├── server.js                  # express bootstrap + cron + bootstrap fetch
├── db.js                      # better-sqlite3 schema + helpers
├── fetchers/leaderboards.js   # the one and only upstream call
├── routes/
│   ├── api.js
│   └── health.js
├── lib/
│   ├── log.js
│   ├── tags.js                # parse upstream `tags` array
│   └── normalize.js           # canonical model-family key
└── public/
    ├── index.html
    ├── app.js
    └── style.css

License

MIT.