crossover-clock

Every AI benchmark, every human baseline, the gap, and the clock.

A live, public dashboard that tracks every major AI benchmark and answers one
question for each: has SOTA crossed the human baseline yet — and if not, when
will it?

🔗 Live: https://holyai.me/crossover-clock/
🔌 Port: 4891 (mounted at /crossover-clock)
🔐 Auth: none. Every endpoint is public.

What it tracks

Ten benchmarks at launch, across five categories:

| Benchmark | Category | Human baseline | Source |
|----------------------|---------------|----------------|------------------------------------------------------------------|
| OSWorld | computer-use | 72.36% | Xie et al. NeurIPS 2024 §4.2 |
| SWE-Bench Verified | code | 50.0% | OpenAI Verified release (Aug 2024) |
| SWE-Bench Pro | code | 50.0% | Scale AI Pro release (Sep 2025) |
| ARC-AGI-1 | reasoning | 80.0% | arcprize.org panel |
| ARC-AGI-2 | reasoning | 66.0% | ARC-AGI-2 paper (May 2025) |
| Humanity's Last Exam | knowledge | 65.0% | Phan et al. (CAIS/Scale, Jan 2025) |
| GPQA Diamond | knowledge | 65.0% | Rein et al. 2023 |
| MMLU | knowledge | 89.8% | Hendrycks et al. 2021 |
| FrontierMath | math | 75.0% | Epoch AI (Nov 2024) |
| AIME 2025 | math | ~5/15 (~33%) | AoPS distribution / USAMO qualifying cut |

Honest data policy

There are exactly two classes of numbers on this page, both labeled in the UI:

Static, published human baselines (in lib/baselines.js). These are
peer-reviewed or arxiv-paper values from the cited source. They never change
at runtime. Each entry carries value, source_url, and note.
Live SOTA scores (in fetchers/*.js). Every score is fetched from the
canonical public leaderboard for that benchmark on a 6-hour cron and on
startup. If a fetcher hits a 403, timeout, or parse error, it throws,
the failure is logged to the fetch_log table, and **no snapshot row is
written**. There is no synthesis, no Math.random() jitter, no seeded
fallback. The corresponding card simply shows the previous snapshot until
the next successful fetch.

No other data class exists in this project.

Live data sources (refreshed every 6h)

OSWorld → https://raw.githubusercontent.com/xlang-ai/OSWorld/main/README.md (markdown table parse)
SWE-Bench Verified → https://api.github.com/repos/swe-bench/experiments/contents/evaluation/verified + per-submission results/results.json + metadata.yaml
SWE-Bench Pro → https://labs.scale.com/leaderboard/swe_bench_pro_public (__NEXT_DATA__ json or cheerio fallback)
ARC-AGI 1 & 2 → https://arcprize.org/leaderboard (cheerio, classify table by sibling heading)
Humanity's Last Exam → https://artificialanalysis.ai/evaluations/humanitys-last-exam (cheerio)
GPQA Diamond → https://raw.githubusercontent.com/idavidrein/gpqa/main/README.md (markdown Diamond column)
MMLU → https://epoch.ai/data/benchmarks?benchmark=mmlu (cheerio)
FrontierMath → https://epoch.ai/benchmarks/frontiermath (cheerio)
AIME 2025 → https://matharena.ai/ (cheerio AIME 2025 column)

API

All endpoints are public, mounted at /crossover-clock:

| Method | Path | Description |
|--------|-------------------------------|------------------------------------------------------|
| GET | / | SPA |
| GET | /health | { status, uptime, benchmarks_count, last_fetch_at } |
| GET | /api/benchmarks | Array of every benchmark + current snapshot + ETA |
| GET | /api/benchmark/:slug | Single benchmark + 200-snapshot history |
| GET | /api/history/:slug?limit=N | Latest N snapshots |
| GET | /api/crossovers | All crossover_events |
| GET | /api/fetch-log?limit=N | Last N fetcher attempts (ok / error) |
| POST | /api/refresh | Trigger all fetchers (rate-limited to one in-flight) |
| POST | /api/refresh/:slug | Trigger one fetcher |

No body or auth required on any of these.

Local dev

npm install
cp .env.example .env
node server.js
# → http://localhost:4891/crossover-clock/

better-sqlite3 needs a native build step; the Cowork sandbox can't reach
nodejs.org for prebuilt binaries, so npm install may warn. On Linux arm64 /
macOS arm64 production hosts the prebuilt binary is downloaded successfully.

Stack

Node 20+, Express 4, better-sqlite3 (WAL), node-cron, helmet, compression,
cheerio. Vanilla JS SPA. Dark theme.

License

MIT