swe-pulse
Live SWE-bench cross-variant leaderboard tracker. Real upstream data, no mocks, no auth.
SWE-bench is the de facto coding-agent benchmark in 2026. There are now six variants (Verified, Lite, Full/Test, Multimodal, Multilingual, bash-only) and the same model often ranks wildly differently on each. swe-pulse pulls the official leaderboards.json every 6 hours and surfaces:
- a six-card overview of the top score on each variant,
- a sortable per-leaderboard table with logos, costs, OSS flags, attempt counts,
- a cross-variant matrix of normalized model families,
- a cost-vs-score Pareto chart with optimal points highlighted,
- the proprietary-vs-OSS frontier gap on every variant,
- submissions-per-month velocity (stacked by variant),
- and a 24h movement view showing which submissions climbed, fell, debuted or dropped off.
Data source
| Source | URL | Frequency |
| --- | --- | --- |
| SWE-bench leaderboards | https://raw.githubusercontent.com/SWE-bench/swe-bench.github.io/master/data/leaderboards.json | every 6 hours (cron 0 /6 ) plus immediate bootstrap on cold start |
Every datapoint on the dashboard comes from this single upstream. No mock arrays, no Math.random jitter, no preset fallbacks. When the upstream is unreachable the failure is recorded in the snapshots table and the dashboard keeps showing the previous valid snapshot instead of inventing data.
Endpoints
All under BASE_PATH (default /swe-pulse), all public, all read-only except /api/refresh which is public but rate-limited to one call per minute.
| Method | Path | Purpose |
| --- | --- | --- |
| GET | /health | service health + latest snapshot age |
| GET | /api/leaderboards | summary card data for all variants |
| GET | /api/leaderboard/:name?oss=1&limit=N | full entries for one leaderboard |
| GET | /api/cross | model-family × leaderboard matrix |
| GET | /api/pareto?lb=Verified | cost-vs-score Pareto frontier |
| GET | /api/oss-gap?lb=Verified | best proprietary vs best OSS comparison |
| GET | /api/orgs | aggregate by submitting org |
| GET | /api/velocity | submissions per month, stacked by leaderboard |
| GET | /api/movement?lb=Verified&top=20 | 24h rank deltas |
| GET | /api/snapshots | last 20 fetch results (success and failures) |
| POST | /api/refresh | force an upstream re-pull (rate-limited 1/min) |
No auth
Every endpoint — reads and the refresh trigger — is public. There is no ADMIN_PASS, no Basic Auth, no admin login page. That is intentional.
Stack
- Node.js 20+
- Express 4 with helmet + compression
- better-sqlite3 (WAL mode) for snapshots and per-snapshot entry rows
- node-cron for the 6-hourly refresh
- Vanilla JS SPA + Chart.js from CDN (no build step)
Run locally
npm install
PORT=4776 BASE_PATH=/swe-pulse node server.js
# → http://localhost:4776/swe-pulse/
Environment variables (see .env.example):
PORT— default4776BASE_PATH— default/swe-pulseDB_PATH— default./data.dbNODE_ENV—productionin deploy
Layout
swe-pulse/
├── server.js # express bootstrap + cron + bootstrap fetch
├── db.js # better-sqlite3 schema + helpers
├── fetchers/leaderboards.js # the one and only upstream call
├── routes/
│ ├── api.js
│ └── health.js
├── lib/
│ ├── log.js
│ ├── tags.js # parse upstream `tags` array
│ └── normalize.js # canonical model-family key
└── public/
├── index.html
├── app.js
└── style.css
License
MIT.