proof-pending
A public ledger of AI-lab scientific/math "breakthrough" announcements with their independent-verification status. Aimed at science journalists, ML researchers, and skeptics who are tired of "AI solves X" headlines that turn out to be re-derivations of published results.
We do not do the verification ourselves. We track it.
What it does
- Ingest lab blog feeds (OpenAI, Anthropic, DeepMind) and the cs.AI section of arXiv filtered to those labs.
- Classify each post into "research claim" vs. "everything else" (LLM via OpenRouter, with a regex heuristic fallback when no key is present).
- Track evidence — HN discussion threads (automated lookup), arXiv citation probes (automated), peer papers / retractions (anyone can submit, no login).
- Derive a status per claim:
announced → unverified → community-checking → peer-verified → retracted. - Surface it: list view, filterable leaderboard, weekly digest, embeddable SVG badges, iframe widgets, status-change RSS for journalists.
Data sources
Every "live" number in the UI comes from one of these. There is no seeded data.
| Source | URL | Refresh |
| --- | --- | --- |
| OpenAI blog RSS | https://openai.com/blog/rss.xml (→ /news/rss.xml) | hourly @ :07 |
| Anthropic sitemap* | https://www.anthropic.com/sitemap.xml (filtered to /news, /research, /engineering) | hourly @ :12 |
| DeepMind blog RSS | https://deepmind.google/blog/rss.xml (with fallbacks) | hourly @ :17 |
| arXiv API | http://export.arxiv.org/api/query?...cs.AI...au:OpenAI/Anthropic/DeepMind... | daily @ 06:00 |
| HN Algolia | https://hn.algolia.com/api/v1/search?... | per-claim, every 30 min until matched or 60d old |
| arXiv citation probe | http://export.arxiv.org/api/query?search_query=all:"<paper_id>" | every 6h for claims with paper_url |
| OpenRouter (anthropic/claude-haiku-4.5) | https://openrouter.ai/api/v1/chat/completions | per new announcement, once |
\* The spec originally listed https://www.anthropic.com/news/rss.xml. That endpoint now returns HTTP 404 (Anthropic stopped serving RSS for /news). The sitemap path was adopted as the fallback during build — see the inline note at the top of SPEC.md.
If OPENROUTER_API_KEY is missing, classification falls back to a regex heuristic (positive markers like "theorem", "SOTA", "benchmark", "arxiv"; negative markers like "hiring", "launching", "policy"). Heuristic-classified announcements still produce real claims for the obvious cases (e.g. the OpenAI Erdős-style math posts) — they're just less precise on edge cases.
Status engine
A pure function in lib/statusEngine.js. Order matters:
any evidence.type == 'retraction' -> 'retracted'
any evidence.type == 'peer_paper' -> 'peer-verified'
any evidence.type == 'arxiv_cite' AND weight >= 2 -> 'peer-verified'
any evidence.type in {hn_thread, expert_thread, arxiv_cite} -> 'community-checking'
now - announced_at < 7 days -> 'announced'
otherwise -> 'unverified'
Recompute fires synchronously on evidence insert/delete, and on a */30 min cron aging pass for announced → unverified transitions. Every change writes a status_changes row recording from_status, to_status, and reason.
HTTP API
All endpoints are mounted under /proof-pending. All endpoints are public (no auth — per RNDLAB house rules). Write endpoints are protected by an in-memory per-IP token bucket only.
Reads
| Method | Path | Description |
| --- | --- | --- |
| GET | /proof-pending/health | service + per-source last_ok_at / last_err_at |
| GET | /proof-pending/api/claims | list claims; query: lab, status, domain, q, limit, offset, sort (newest / oldest_unverified / longest_gap) |
| GET | /proof-pending/api/claims/:slug | claim detail + evidence + status_changes |
| GET | /proof-pending/api/labs | per-lab counts and avg/median days-to-verify |
| GET | /proof-pending/api/digest | claims unverified AND announced > 30 days ago, oldest first |
| GET | /proof-pending/api/announcements/raw | every ingested row incl. non-claims (debug/transparency) |
| GET | /proof-pending/api/status-changes | last N status changes (JSON) |
| GET | /proof-pending/api/health-detail | DB counts + source health + last ingest/classify timestamps |
| GET | /proof-pending/badge/:slug.svg | dynamic SVG badge, ?size=sm\|md\|lg, Cache-Control: public, max-age=300 |
| GET | /proof-pending/embed/:slug | 250×90 iframe-friendly HTML widget |
| GET | /proof-pending/c/:slug | server-rendered claim page with OG/Twitter meta tags |
| GET | /proof-pending/feed/changes.rss | last 50 status changes as RSS 2.0 |
Writes
| Method | Path | Description |
| --- | --- | --- |
| POST | /proof-pending/api/claims/:slug/evidence | body: {evidence_type, url, description}. Inserts evidence and recomputes status synchronously. 20/hour/IP. |
| DELETE | /proof-pending/api/evidence/:id | remove an evidence row and recompute. Same rate limit bucket. |
| POST | /proof-pending/api/refresh | force-run all scrapers + classify pass. 1/min/IP. |
evidence_type ∈ {hn_thread, arxiv_cite, expert_thread, peer_paper, retraction, other}.
Embeddable badge
Drop into a news article or a tweet preview:
<iframe src="https://holyai.me/proof-pending/embed/<slug>" width="250" height="90"></iframe>

The SVG renders live status + days-since-announced. Cached 5 minutes. Three sizes via ?size=sm|md|lg.
Running locally
npm install
cp .env.example .env # OPENROUTER_API_KEY is optional in dev
PORT=4866 npm start
Then open http://localhost:4866/proof-pending/. The first ingest cycle runs ~250 ms after boot and takes 5–30 s to populate ~1000+ raw announcements; classification then runs in chunks of 200 until the backlog is empty. The leaderboard and stats strip fill in as claims are created.
Layout
proof-pending/
server.js express + helmet + compression, mounts /proof-pending
db.js better-sqlite3 (WAL) + prepared statements
cron.js node-cron schedules + refresh orchestration
lib/
classify.js OpenRouter call + JSON parsing + heuristic fallback
statusEngine.js pure status function + apply-and-log helper
rssParse.js RSS 2.0 / Atom parser (fast-xml-parser)
slug.js stable lab+title+date slug, with uniqueness guard
rateLimit.js per-IP token bucket
log.js tagged JSON-ish logger
scrapers/
rssOpenAI.js / rssAnthropic.js / rssDeepMind.js feed pulls
arxiv.js arXiv ingest + per-claim citation probe
hnSearch.js Algolia HN thread discovery
routes/
api.js JSON CRUD + digest + raw + refresh
page.js server-rendered claim page (for OG unfurl)
badge.js dynamic SVG badge
embed.js iframe widget
feed.js status-change RSS 2.0
public/
index.html SPA shell
app.js vanilla JS, URL-hash filter state
style.css dark theme
data/
proof-pending.db SQLite (WAL) — gitignored
What's deliberately not built
No auth, login, admin pages, or API keys for writes. No email, no Twitter post (RSS is provided instead). No quality scoring — only verification status. No academic-lab AI papers (Stanford, MIT, FAIR, xAI). No real-time websockets. Nothing not listed in the spec.
License
MIT.