crossover-clock
Every AI benchmark, every human baseline, the gap, and the clock.
A live, public dashboard that tracks every major AI benchmark and answers one
question for each: has SOTA crossed the human baseline yet — and if not, when
will it?
- 🔗 Live: https://holyai.me/crossover-clock/
- 🔌 Port: 4891 (mounted at
/crossover-clock) - 🔐 Auth: none. Every endpoint is public.
What it tracks
Ten benchmarks at launch, across five categories:
| Benchmark | Category | Human baseline | Source |
|----------------------|---------------|----------------|------------------------------------------------------------------|
| OSWorld | computer-use | 72.36% | Xie et al. NeurIPS 2024 §4.2 |
| SWE-Bench Verified | code | 50.0% | OpenAI Verified release (Aug 2024) |
| SWE-Bench Pro | code | 50.0% | Scale AI Pro release (Sep 2025) |
| ARC-AGI-1 | reasoning | 80.0% | arcprize.org panel |
| ARC-AGI-2 | reasoning | 66.0% | ARC-AGI-2 paper (May 2025) |
| Humanity's Last Exam | knowledge | 65.0% | Phan et al. (CAIS/Scale, Jan 2025) |
| GPQA Diamond | knowledge | 65.0% | Rein et al. 2023 |
| MMLU | knowledge | 89.8% | Hendrycks et al. 2021 |
| FrontierMath | math | 75.0% | Epoch AI (Nov 2024) |
| AIME 2025 | math | ~5/15 (~33%) | AoPS distribution / USAMO qualifying cut |
Honest data policy
There are exactly two classes of numbers on this page, both labeled in the UI:
- Static, published human baselines (in
lib/baselines.js). These are - peer-reviewed or arxiv-paper values from the cited source. They never change
- at runtime. Each entry carries
value,source_url, andnote. - Live SOTA scores (in
fetchers/*.js). Every score is fetched from the - canonical public leaderboard for that benchmark on a 6-hour cron and on
- startup. If a fetcher hits a 403, timeout, or parse error, it throws,
- the failure is logged to the
fetch_logtable, and **no snapshot row is - written**. There is no synthesis, no
Math.random()jitter, no seeded - fallback. The corresponding card simply shows the previous snapshot until
- the next successful fetch.
No other data class exists in this project.
Live data sources (refreshed every 6h)
- OSWorld →
https://raw.githubusercontent.com/xlang-ai/OSWorld/main/README.md(markdown table parse) - SWE-Bench Verified →
https://api.github.com/repos/swe-bench/experiments/contents/evaluation/verified+ per-submissionresults/results.json+metadata.yaml - SWE-Bench Pro →
https://labs.scale.com/leaderboard/swe_bench_pro_public(__NEXT_DATA__json or cheerio fallback) - ARC-AGI 1 & 2 →
https://arcprize.org/leaderboard(cheerio, classify table by sibling heading) - Humanity's Last Exam →
https://artificialanalysis.ai/evaluations/humanitys-last-exam(cheerio) - GPQA Diamond →
https://raw.githubusercontent.com/idavidrein/gpqa/main/README.md(markdown Diamond column) - MMLU →
https://epoch.ai/data/benchmarks?benchmark=mmlu(cheerio) - FrontierMath →
https://epoch.ai/benchmarks/frontiermath(cheerio) - AIME 2025 →
https://matharena.ai/(cheerio AIME 2025 column)
API
All endpoints are public, mounted at /crossover-clock:
| Method | Path | Description |
|--------|-------------------------------|------------------------------------------------------|
| GET | / | SPA |
| GET | /health | { status, uptime, benchmarks_count, last_fetch_at } |
| GET | /api/benchmarks | Array of every benchmark + current snapshot + ETA |
| GET | /api/benchmark/:slug | Single benchmark + 200-snapshot history |
| GET | /api/history/:slug?limit=N | Latest N snapshots |
| GET | /api/crossovers | All crossover_events |
| GET | /api/fetch-log?limit=N | Last N fetcher attempts (ok / error) |
| POST | /api/refresh | Trigger all fetchers (rate-limited to one in-flight) |
| POST | /api/refresh/:slug | Trigger one fetcher |
No body or auth required on any of these.
Local dev
npm install
cp .env.example .env
node server.js
# → http://localhost:4891/crossover-clock/
better-sqlite3 needs a native build step; the Cowork sandbox can't reach
nodejs.org for prebuilt binaries, so npm install may warn. On Linux arm64 /
macOS arm64 production hosts the prebuilt binary is downloaded successfully.
Stack
Node 20+, Express 4, better-sqlite3 (WAL), node-cron, helmet, compression,
cheerio. Vanilla JS SPA. Dark theme.
License
MIT