proof-pending

A public ledger of AI-lab scientific/math "breakthrough" announcements with their independent-verification status. Aimed at science journalists, ML researchers, and skeptics who are tired of "AI solves X" headlines that turn out to be re-derivations of published results.

We do not do the verification ourselves. We track it.

What it does

Ingest lab blog feeds (OpenAI, Anthropic, DeepMind) and the cs.AI section of arXiv filtered to those labs.
Classify each post into "research claim" vs. "everything else" (LLM via OpenRouter, with a regex heuristic fallback when no key is present).
Track evidence — HN discussion threads (automated lookup), arXiv citation probes (automated), peer papers / retractions (anyone can submit, no login).
Derive a status per claim: announced → unverified → community-checking → peer-verified → retracted.
Surface it: list view, filterable leaderboard, weekly digest, embeddable SVG badges, iframe widgets, status-change RSS for journalists.

Data sources

Every "live" number in the UI comes from one of these. There is no seeded data.

| Source | URL | Refresh |
| --- | --- | --- |
| OpenAI blog RSS | https://openai.com/blog/rss.xml (→ /news/rss.xml) | hourly @ :07 |
| Anthropic sitemap* | https://www.anthropic.com/sitemap.xml (filtered to /news, /research, /engineering) | hourly @ :12 |
| DeepMind blog RSS | https://deepmind.google/blog/rss.xml (with fallbacks) | hourly @ :17 |
| arXiv API | http://export.arxiv.org/api/query?...cs.AI...au:OpenAI/Anthropic/DeepMind... | daily @ 06:00 |
| HN Algolia | https://hn.algolia.com/api/v1/search?... | per-claim, every 30 min until matched or 60d old |
| arXiv citation probe | http://export.arxiv.org/api/query?search_query=all:"<paper_id>" | every 6h for claims with paper_url |
| OpenRouter (anthropic/claude-haiku-4.5) | https://openrouter.ai/api/v1/chat/completions | per new announcement, once |

\* The spec originally listed https://www.anthropic.com/news/rss.xml. That endpoint now returns HTTP 404 (Anthropic stopped serving RSS for /news). The sitemap path was adopted as the fallback during build — see the inline note at the top of SPEC.md.

If OPENROUTER_API_KEY is missing, classification falls back to a regex heuristic (positive markers like "theorem", "SOTA", "benchmark", "arxiv"; negative markers like "hiring", "launching", "policy"). Heuristic-classified announcements still produce real claims for the obvious cases (e.g. the OpenAI Erdős-style math posts) — they're just less precise on edge cases.

Status engine

A pure function in lib/statusEngine.js. Order matters:

any evidence.type == 'retraction'                              -> 'retracted'
any evidence.type == 'peer_paper'                              -> 'peer-verified'
any evidence.type == 'arxiv_cite' AND weight >= 2              -> 'peer-verified'
any evidence.type in {hn_thread, expert_thread, arxiv_cite}    -> 'community-checking'
now - announced_at < 7 days                                    -> 'announced'
otherwise                                                       -> 'unverified'

Recompute fires synchronously on evidence insert/delete, and on a */30 min cron aging pass for announced → unverified transitions. Every change writes a status_changes row recording from_status, to_status, and reason.

HTTP API

All endpoints are mounted under /proof-pending. All endpoints are public (no auth — per RNDLAB house rules). Write endpoints are protected by an in-memory per-IP token bucket only.

Reads

| Method | Path | Description |
| --- | --- | --- |
| GET | /proof-pending/health | service + per-source last_ok_at / last_err_at |
| GET | /proof-pending/api/claims | list claims; query: lab, status, domain, q, limit, offset, sort (newest / oldest_unverified / longest_gap) |
| GET | /proof-pending/api/claims/:slug | claim detail + evidence + status_changes |
| GET | /proof-pending/api/labs | per-lab counts and avg/median days-to-verify |
| GET | /proof-pending/api/digest | claims unverified AND announced > 30 days ago, oldest first |
| GET | /proof-pending/api/announcements/raw | every ingested row incl. non-claims (debug/transparency) |
| GET | /proof-pending/api/status-changes | last N status changes (JSON) |
| GET | /proof-pending/api/health-detail | DB counts + source health + last ingest/classify timestamps |
| GET | /proof-pending/badge/:slug.svg | dynamic SVG badge, ?size=sm\|md\|lg, Cache-Control: public, max-age=300 |
| GET | /proof-pending/embed/:slug | 250×90 iframe-friendly HTML widget |
| GET | /proof-pending/c/:slug | server-rendered claim page with OG/Twitter meta tags |
| GET | /proof-pending/feed/changes.rss | last 50 status changes as RSS 2.0 |

Writes

| Method | Path | Description |
| --- | --- | --- |
| POST | /proof-pending/api/claims/:slug/evidence | body: {evidence_type, url, description}. Inserts evidence and recomputes status synchronously. 20/hour/IP. |
| DELETE | /proof-pending/api/evidence/:id | remove an evidence row and recompute. Same rate limit bucket. |
| POST | /proof-pending/api/refresh | force-run all scrapers + classify pass. 1/min/IP. |

evidence_type ∈ {hn_thread, arxiv_cite, expert_thread, peer_paper, retraction, other}.

Embeddable badge

Drop into a news article or a tweet preview:

<iframe src="https://holyai.me/proof-pending/embed/<slug>" width="250" height="90"></iframe>

![status](https://holyai.me/proof-pending/badge/<slug>.svg)

The SVG renders live status + days-since-announced. Cached 5 minutes. Three sizes via ?size=sm|md|lg.

Running locally

npm install
cp .env.example .env       # OPENROUTER_API_KEY is optional in dev
PORT=4866 npm start

Then open http://localhost:4866/proof-pending/. The first ingest cycle runs ~250 ms after boot and takes 5–30 s to populate ~1000+ raw announcements; classification then runs in chunks of 200 until the backlog is empty. The leaderboard and stats strip fill in as claims are created.

Layout

proof-pending/
  server.js                  express + helmet + compression, mounts /proof-pending
  db.js                      better-sqlite3 (WAL) + prepared statements
  cron.js                    node-cron schedules + refresh orchestration
  lib/
    classify.js              OpenRouter call + JSON parsing + heuristic fallback
    statusEngine.js          pure status function + apply-and-log helper
    rssParse.js              RSS 2.0 / Atom parser (fast-xml-parser)
    slug.js                  stable lab+title+date slug, with uniqueness guard
    rateLimit.js             per-IP token bucket
    log.js                   tagged JSON-ish logger
  scrapers/
    rssOpenAI.js / rssAnthropic.js / rssDeepMind.js   feed pulls
    arxiv.js                 arXiv ingest + per-claim citation probe
    hnSearch.js              Algolia HN thread discovery
  routes/
    api.js                   JSON CRUD + digest + raw + refresh
    page.js                  server-rendered claim page (for OG unfurl)
    badge.js                 dynamic SVG badge
    embed.js                 iframe widget
    feed.js                  status-change RSS 2.0
  public/
    index.html               SPA shell
    app.js                   vanilla JS, URL-hash filter state
    style.css                dark theme
  data/
    proof-pending.db         SQLite (WAL) — gitignored

What's deliberately not built

No auth, login, admin pages, or API keys for writes. No email, no Twitter post (RSS is provided instead). No quality scoring — only verification status. No academic-lab AI papers (Stanford, MIT, FAIR, xAI). No real-time websockets. Nothing not listed in the spec.

License

MIT.