← back to gallery

Benchmark Headroom

Saturation, velocity, and solve-ETA for every major AI benchmark

aibenchmarksai-progressleaderboardepoch-aievals
Open product ↗

headroom

How saturated is each AI benchmark? Live dashboard ranking every major AI benchmark by headroom, velocity, and solve-ETA.

Built from Epoch AI's public eci_benchmarks dataset (1500+ rows, ~40 benchmarks, CC-BY).
No mocks, no seed data — every number is computed at runtime from a real CSV.

What it answers

Install & run

cp .env.example .env
npm install
node server.js
# open http://localhost:4785/headroom/

Node 20+ required (uses native fetch). The first boot will fetch the Epoch CSV and compute all stats; subsequent refreshes are scheduled every 6 hours via node-cron.

Endpoints (all public, no auth)

| Method | Path | Description |
|---|---|---|
| GET | /headroom/ | SPA dashboard |
| GET | /headroom/health | health JSON |
| GET | /headroom/api/benchmarks | all benchmark stats |
| GET | /headroom/api/benchmarks/:slug | one benchmark + timeline + top scores |
| GET | /headroom/api/timeline/:slug | SOTA-over-time series |
| GET | /headroom/api/summary | counts + last refresh |
| GET | /headroom/api/refresh_log | last 20 refresh attempts |
| POST | /headroom/api/refresh | trigger an async refresh (202) |
| GET | /headroom/api/badge/:slug | embeddable SVG badge |
| GET | /headroom/api/export.json | full stats as JSON |
| GET | /headroom/api/export.csv | full stats as CSV |

Data source

Single upstream: https://epoch.ai/data/eci_benchmarks.csv (Epoch AI, CC-BY).
Refresh frequency: every 6 hours, plus on-boot if the DB is empty.
Schema validated on every refresh; on failure, the previous snapshot stays live and the error is logged to refresh_log.

No auth — by design

This project has no admin password, no Basic Auth, no /admin route. Every endpoint, including POST /api/refresh, is public and idempotent. The dashboard is meant to be inspected instantly and embedded freely.

Computation

For each benchmark in raw_rows:

  1. All (date, performance) pairs sorted ascending.
  2. Running-max SOTA curve stored in sota_timeline.
  3. Velocity (12 mo) = (SOTA_now − SOTA_12mo_ago) / 12, points-per-month.
  4. Headroom = ceiling − SOTA_now (ceiling defaults to 1.0, overridable per benchmark in lib/ceilings.js).
  5. ETA = headroom / velocity_12mo months until SOTA ≥ 0.95 × ceiling.
  6. Status: SATURATED (≥95%), NEAR_SOLVED (≥85%), ACTIVE (50–85%), FRONTIER (<50%), DEAD (no improvement in 12 mo), INSUFFICIENT_DATA (<3 rows).

License

MIT for the code. Data from Epoch AI is CC-BY — please attribute when reusing.