headroom
How saturated is each AI benchmark? Live dashboard ranking every major AI benchmark by headroom, velocity, and solve-ETA.
Built from Epoch AI's public eci_benchmarks dataset (1500+ rows, ~40 benchmarks, CC-BY).
No mocks, no seed data — every number is computed at runtime from a real CSV.
What it answers
- Which benchmarks are still discriminating (real headroom above SOTA)?
- Which are saturated or dead (no improvement in 12 months)?
- How fast is each benchmark being eaten (points-per-month velocity)?
- Estimated months until each benchmark is "solved" (SOTA ≥ 95% of ceiling).
- Which capability is moving fastest right now?
Install & run
cp .env.example .env
npm install
node server.js
# open http://localhost:4785/headroom/
Node 20+ required (uses native fetch). The first boot will fetch the Epoch CSV and compute all stats; subsequent refreshes are scheduled every 6 hours via node-cron.
Endpoints (all public, no auth)
| Method | Path | Description |
|---|---|---|
| GET | /headroom/ | SPA dashboard |
| GET | /headroom/health | health JSON |
| GET | /headroom/api/benchmarks | all benchmark stats |
| GET | /headroom/api/benchmarks/:slug | one benchmark + timeline + top scores |
| GET | /headroom/api/timeline/:slug | SOTA-over-time series |
| GET | /headroom/api/summary | counts + last refresh |
| GET | /headroom/api/refresh_log | last 20 refresh attempts |
| POST | /headroom/api/refresh | trigger an async refresh (202) |
| GET | /headroom/api/badge/:slug | embeddable SVG badge |
| GET | /headroom/api/export.json | full stats as JSON |
| GET | /headroom/api/export.csv | full stats as CSV |
Data source
Single upstream: https://epoch.ai/data/eci_benchmarks.csv (Epoch AI, CC-BY).
Refresh frequency: every 6 hours, plus on-boot if the DB is empty.
Schema validated on every refresh; on failure, the previous snapshot stays live and the error is logged to refresh_log.
No auth — by design
This project has no admin password, no Basic Auth, no /admin route. Every endpoint, including POST /api/refresh, is public and idempotent. The dashboard is meant to be inspected instantly and embedded freely.
Computation
For each benchmark in raw_rows:
- All
(date, performance)pairs sorted ascending. - Running-max SOTA curve stored in
sota_timeline. - Velocity (12 mo) =
(SOTA_now − SOTA_12mo_ago) / 12, points-per-month. - Headroom =
ceiling − SOTA_now(ceiling defaults to 1.0, overridable per benchmark inlib/ceilings.js). - ETA =
headroom / velocity_12momonths until SOTA ≥ 0.95 × ceiling. - Status:
SATURATED(≥95%),NEAR_SOLVED(≥85%),ACTIVE(50–85%),FRONTIER(<50%),DEAD(no improvement in 12 mo),INSUFFICIENT_DATA(<3 rows).
License
MIT for the code. Data from Epoch AI is CC-BY — please attribute when reusing.