policy-diff

Live diff tracker for the acceptable-use policies, usage policies and model specs of every major AI lab — pulled on cron, diffed against the previous snapshot, classified by Claude Haiku, surfaced as a single dashboard so compliance, legal and product teams catch silent wording changes before they break things.

What it does

Crawls 16 policy/spec/terms documents across 10 labs (OpenAI, Anthropic, Google, Meta, xAI, Mistral, DeepSeek, Cohere, Microsoft, Perplexity) every 12 hours.
Snapshots the extracted plaintext into SQLite (WAL) — deduping by SHA-256, gzip-storing the raw HTML for forensic replay.
Diffs each new snapshot against the previous version using a unified-diff Myers algorithm (npm diff).
Classifies each diff via OpenRouter anthropic/claude-haiku-4.5 into one of {added_restriction, removed_restriction, clarified_existing, definition_change, enforcement_change, cosmetic, unknown} with a severity (high/medium/low/noise) and a single-sentence neutral summary.
Serves a public dark-theme SPA at /policy-diff/ with filters by lab / severity / window / full-text, lab roll-up, source detail timelines, per-change diff view, shareable SVG OG cards, and a stats dashboard.

There are zero mocks. Every snapshot is a real HTTP fetch. If a source fails, the dashboard shows it as error under /api/sources and skips — the next cron cycle retries.

Tech stack

Node.js 20 + Express
better-sqlite3 (WAL mode)
node-cron (queue drain + sweep + classifier passes)
helmet + compression
axios + cheerio (HTML → plaintext)
npm diff (unified-diff)
Vanilla JS SPA, no build step

Endpoints (all public, no auth)

| Method | Path | Purpose |
|---|---|---|
| GET | /policy-diff/ | SPA shell |
| GET | /policy-diff/health | {status:"ok", uptime_seconds, version} |
| GET | /policy-diff/api/sources | All tracked sources w/ snapshot + change counts |
| GET | /policy-diff/api/source/:slug | Source detail: snapshots, changes, latest text |
| GET | /policy-diff/api/changes | Paginated change feed. Filters: lab, severity, days, q, cursor, limit |
| GET | /policy-diff/api/change/:id | Full change record + classifier output + diff |
| GET | /policy-diff/api/snapshot/:id/text | Raw extracted plaintext |
| GET | /policy-diff/api/snapshot/:id/raw | Raw upstream HTML (gzip-encoded if client accepts) |
| GET | /policy-diff/api/stats | Aggregates: counts, top-labs, severity breakdown |
| POST | /policy-diff/api/refresh | Queue a refresh (1 req/IP/60s). Body: {slug?} |
| GET | /policy-diff/api/refresh/status | Queue depth + recent refresh log |
| POST | /policy-diff/api/source/:slug/notify | Subscribe an email (stub; no SMTP send v1) |
| GET | /policy-diff/api/change/:id/og.svg | 1200×630 shareable SVG card |
| GET | /policy-diff/api/change/:id/og.png | PNG card (requires node-canvas + ENABLE_PNG_OG=true) |

Data sources & refresh cadence

Every source is fetched from a public URL with no auth. Cadence: full sweep every REFRESH_INTERVAL_HOURS (default 12), plus on-demand via POST /api/refresh.

| Slug | Lab | Document | URL |
|---|---|---|---|
| openai-usage-policies | OpenAI | Usage Policies | https://openai.com/policies/usage-policies/ |
| openai-model-spec | OpenAI | Model Spec | https://model-spec.openai.com/2025-09-12.html |
| openai-service-terms | OpenAI | Service Terms | https://openai.com/policies/service-terms/ |
| anthropic-usage-policy | Anthropic | Usage Policy | https://www.anthropic.com/legal/aup |
| anthropic-commercial-tos | Anthropic | Commercial Terms | https://www.anthropic.com/legal/commercial-terms |
| anthropic-consumer-tos | Anthropic | Consumer Terms | https://www.anthropic.com/legal/consumer-terms |
| google-generative-ai-use-policy | Google | GenAI Prohibited Use | https://policies.google.com/terms/generative-ai/use-policy |
| google-gemini-terms | Google | Gemini Additional Terms | https://policies.google.com/terms/generative-ai |
| meta-llama-use-policy | Meta | Llama Acceptable Use | https://www.llama.com/llama3/use-policy/ |
| xai-acceptable-use | xAI | Acceptable Use | https://x.ai/legal/acceptable-use-policy |
| mistral-terms | Mistral | Terms of Use | https://mistral.ai/terms/ |
| deepseek-terms | DeepSeek | Terms of Use | https://www.deepseek.com/en/terms_of_use |
| cohere-saas-agreement | Cohere | SaaS Agreement | https://cohere.com/terms-of-use |
| cohere-usage-guidelines | Cohere | Usage Guidelines | https://docs.cohere.com/docs/usage-guidelines |
| microsoft-services-agreement | Microsoft | Services Agreement | https://www.microsoft.com/en-us/servicesagreement/ |
| perplexity-terms | Perplexity | Terms of Service | https://www.perplexity.ai/hub/legal/terms-of-service |

Run locally

cp .env.example .env       # then set OPENROUTER_API_KEY for classification
npm install
npm start                  # listens on :4822
open http://localhost:4822/policy-diff/

On boot, the cold-start cron enqueues every source after 5 seconds; you should see the snapshot count climb to 16 within a minute. Diffs only show up from the second snapshot of any source onward — so the change feed will be empty until the second refresh cycle.

If OPENROUTER_API_KEY is unset, snapshots and diffs are still recorded; the change feed simply shows unclassified cards until a key is configured.

Database

data/policy-diff.db — better-sqlite3 in WAL mode. Tables: source, snapshot, change, refresh_log, notify_subscription, meta. See db.js for the schema. The raw HTML of every snapshot is gzip-stored as a BLOB so you can replay the original page from /api/snapshot/:id/raw any time.

Deployment

This product ships a DEPLOY_MANIFEST.json and is picked up by the RNDLAB orchestrator (which rsyncs to a Mac mini, installs a systemd unit, configures nginx, posts to the showcase API, and takes a Playwright thumbnail). Secrets are injected from the RNDLAB key vault at deploy time — __INJECT_FROM_VAULT__ placeholders in .env.example get replaced.

License

MIT.