headroom

How saturated is each AI benchmark? Live dashboard ranking every major AI benchmark by headroom, velocity, and solve-ETA.

Built from Epoch AI's public eci_benchmarks dataset (1500+ rows, ~40 benchmarks, CC-BY).
No mocks, no seed data — every number is computed at runtime from a real CSV.

What it answers

Which benchmarks are still discriminating (real headroom above SOTA)?
Which are saturated or dead (no improvement in 12 months)?
How fast is each benchmark being eaten (points-per-month velocity)?
Estimated months until each benchmark is "solved" (SOTA ≥ 95% of ceiling).
Which capability is moving fastest right now?

Install & run

cp .env.example .env
npm install
node server.js
# open http://localhost:4785/headroom/

Node 20+ required (uses native fetch). The first boot will fetch the Epoch CSV and compute all stats; subsequent refreshes are scheduled every 6 hours via node-cron.

Endpoints (all public, no auth)

| Method | Path | Description |
|---|---|---|
| GET | /headroom/ | SPA dashboard |
| GET | /headroom/health | health JSON |
| GET | /headroom/api/benchmarks | all benchmark stats |
| GET | /headroom/api/benchmarks/:slug | one benchmark + timeline + top scores |
| GET | /headroom/api/timeline/:slug | SOTA-over-time series |
| GET | /headroom/api/summary | counts + last refresh |
| GET | /headroom/api/refresh_log | last 20 refresh attempts |
| POST | /headroom/api/refresh | trigger an async refresh (202) |
| GET | /headroom/api/badge/:slug | embeddable SVG badge |
| GET | /headroom/api/export.json | full stats as JSON |
| GET | /headroom/api/export.csv | full stats as CSV |

Data source

Single upstream: https://epoch.ai/data/eci_benchmarks.csv (Epoch AI, CC-BY).
Refresh frequency: every 6 hours, plus on-boot if the DB is empty.
Schema validated on every refresh; on failure, the previous snapshot stays live and the error is logged to refresh_log.

No auth — by design

This project has no admin password, no Basic Auth, no /admin route. Every endpoint, including POST /api/refresh, is public and idempotent. The dashboard is meant to be inspected instantly and embedded freely.

Computation

For each benchmark in raw_rows:

All (date, performance) pairs sorted ascending.
Running-max SOTA curve stored in sota_timeline.
Velocity (12 mo) = (SOTA_now − SOTA_12mo_ago) / 12, points-per-month.
Headroom = ceiling − SOTA_now (ceiling defaults to 1.0, overridable per benchmark in lib/ceilings.js).
ETA = headroom / velocity_12mo months until SOTA ≥ 0.95 × ceiling.
Status: SATURATED (≥95%), NEAR_SOLVED (≥85%), ACTIVE (50–85%), FRONTIER (<50%), DEAD (no improvement in 12 mo), INSUFFICIENT_DATA (<3 rows).

License

MIT for the code. Data from Epoch AI is CC-BY — please attribute when reusing.