stainless-watch

A public scoreboard tracking how well each major LLM provider's official SDKs cover their own API. One row per provider × language (Python + TypeScript), scored with a composite Lag Index from real data fetched from GitHub.

The Stainless acquisition by Anthropic just made "how good is your SDK?" a public, comparable, ship-velocity metric. This is that leaderboard.

Live at <https://holyai.me/stainless-watch>.

What it tracks

12 SDK cells (6 providers × {Python, TypeScript}):

| Provider | Python | TypeScript |
|---|---|---|
| OpenAI | openai/openai-python | openai/openai-node |
| Anthropic | anthropics/anthropic-sdk-python | anthropics/anthropic-sdk-typescript |
| Google | googleapis/python-genai | googleapis/js-genai |
| Mistral | mistralai/client-python | mistralai/client-ts |
| Cohere | cohere-ai/cohere-python | cohere-ai/cohere-typescript |
| Groq | groq/groq-python | groq/groq-typescript |

For each cell, every refresh pulls:

Latest release version and date (GitHub Releases API).
Stainless .stats.yml — configured_endpoints count + openapi_spec_url. Missing file → marked stainless: false.
CHANGELOG.md parsed for breaking changes in the last 90 days (heading scan + BREAKING keywords + major-version bumps).
Repo metadata — stars and default_branch.

Lag Index (0–100)

recency_pts   = clamp(40 - lag_days * 1.5, 0, 40)        # 0d = 40, 27d = 0
cadence_pts   = min(releases_30d * 5, 30)                # 6+ releases = 30
surface_pts   = 20 * (configured_endpoints / peer_max)   # 0 if null; peer = same language
stability_pts = clamp(10 - breaking_90d * 2, 0, 10)
lag_score     = recency + cadence + surface + stability

Data sources & refresh schedule

| # | Source | URL | Refresh |
|---|---|---|---|
| 1 | GitHub Releases API | https://api.github.com/repos/{org}/{repo}/releases?per_page=30 | every 6 hours |
| 2 | Stainless .stats.yml | https://raw.githubusercontent.com/{org}/{repo}/{branch}/.stats.yml | every 6 hours |
| 3 | SDK CHANGELOG.md | https://raw.githubusercontent.com/{org}/{repo}/{branch}/CHANGELOG.md | daily, 04:00 UTC |
| 4 | GitHub repo metadata (stars, default_branch) | https://api.github.com/repos/{org}/{repo} | daily, 04:00 UTC |

Weekly snapshot of all 12 cells is persisted to metrics_snapshot every Monday at 06:00 UTC (powers the sparkline).

Requests send User-Agent: stainless-watch/1.0 (+https://holyai.me/stainless-watch). If GITHUB_TOKEN is set in the environment, an Authorization: Bearer … header is added so we get the 5000 req/h limit instead of the 60 req/h unauth limit. Token is optional.

Failure handling

No mocks. If a fetch fails, the cell keeps its last known value and stale_since is set. After 24h of continuous failures the row is rendered with a stale badge. If a cell has never successfully fetched, it shows —. All failures land in the fetch log surfaced at /api/health/diagnostics.

API

All routes are mounted under /stainless-watch.

| Method | Path | Purpose |
|---|---|---|
| GET | /health | { "ok": true } HTTP 200 — liveness |
| GET | /api/health/diagnostics | Last successful fetch per source + recent errors |
| GET | /api/leaderboard | Full scorecard — array of 12 cells |
| GET | /api/sdk/:provider/:language | Per-SDK detail: latest 30 releases, 30 changelog entries, snapshot history |
| GET | /api/changelog/:provider/:language?limit=N | Parsed CHANGELOG entries (default 50, max 200) |
| GET | /api/history?provider=&language= | Weekly score snapshots |
| GET | /api/refresh/:source | Manually trigger a scraper — releases, stats, changelog, repo-meta, or all. Rate-limited to 1/min per IP |
| GET | /badge/:provider/:language.svg | Shield-style SVG badge. Cache-Control: public, max-age=600 |
| GET | / | SPA shell |
| GET | /assets/* | Static files |

Example badge

https://holyai.me/stainless-watch/badge/openai/python.svg

Markdown:

[![openai-python](https://holyai.me/stainless-watch/badge/openai/python.svg)](https://holyai.me/stainless-watch/#/sdk/openai/python)

Color: red <40, yellow 40–70, green 70+.

Run locally

cp .env.example .env       # optional: set GITHUB_TOKEN to avoid 60 req/h limit
npm install
node server.js             # listens on PORT (default 4821) at /stainless-watch

Then:

<http://localhost:4821/stainless-watch/> — leaderboard
<http://localhost:4821/stainless-watch/health> — {ok:true}
<http://localhost:4821/stainless-watch/api/leaderboard> — JSON
<http://localhost:4821/stainless-watch/badge/openai/python.svg> — SVG badge

On first boot the scheduler kicks an immediate fetch of all four sources so the leaderboard has real numbers within ~30 seconds. After that, the cron schedule above takes over.

Stack

Node.js ≥ 22, Express 4
better-sqlite3 (WAL), schema in db.js
node-cron for the refresh schedule
helmet + compression
Vanilla JS SPA, no build step

Scope cuts (deliberate)

No auto-generated SDK patches, no PR creation against upstream repos.
No community / unofficial SDKs.
No languages beyond Python + TypeScript.
No per-endpoint correctness testing — we use Stainless's own configured_endpoints as ground truth for surface area.
No LLM-based summarization of CHANGELOG entries — raw markdown only.
No auth, no payments, no analytics, no email/Slack/webhook outputs.