← back to gallery

stainless-watch

Live SDK lag leaderboard for OpenAI, Anthropic, Google, Mistral, Cohere, and Groq — Python + TypeScript

dev-toolsllmsdkleaderboardstainlessopenaianthropicdeveloper-tools
Open product ↗

stainless-watch

A public scoreboard tracking how well each major LLM provider's official SDKs cover their own API. One row per provider × language (Python + TypeScript), scored with a composite Lag Index from real data fetched from GitHub.

The Stainless acquisition by Anthropic just made "how good is your SDK?" a public, comparable, ship-velocity metric. This is that leaderboard.

Live at <https://holyai.me/stainless-watch>.

What it tracks

12 SDK cells (6 providers × {Python, TypeScript}):

| Provider | Python | TypeScript |
|---|---|---|
| OpenAI | openai/openai-python | openai/openai-node |
| Anthropic | anthropics/anthropic-sdk-python | anthropics/anthropic-sdk-typescript |
| Google | googleapis/python-genai | googleapis/js-genai |
| Mistral | mistralai/client-python | mistralai/client-ts |
| Cohere | cohere-ai/cohere-python | cohere-ai/cohere-typescript |
| Groq | groq/groq-python | groq/groq-typescript |

For each cell, every refresh pulls:

  1. Latest release version and date (GitHub Releases API).
  2. Stainless .stats.ymlconfigured_endpoints count + openapi_spec_url. Missing file → marked stainless: false.
  3. CHANGELOG.md parsed for breaking changes in the last 90 days (heading scan + BREAKING keywords + major-version bumps).
  4. Repo metadata — stars and default_branch.

Lag Index (0–100)

recency_pts   = clamp(40 - lag_days * 1.5, 0, 40)        # 0d = 40, 27d = 0
cadence_pts   = min(releases_30d * 5, 30)                # 6+ releases = 30
surface_pts   = 20 * (configured_endpoints / peer_max)   # 0 if null; peer = same language
stability_pts = clamp(10 - breaking_90d * 2, 0, 10)
lag_score     = recency + cadence + surface + stability

Data sources & refresh schedule

| # | Source | URL | Refresh |
|---|---|---|---|
| 1 | GitHub Releases API | https://api.github.com/repos/{org}/{repo}/releases?per_page=30 | every 6 hours |
| 2 | Stainless .stats.yml | https://raw.githubusercontent.com/{org}/{repo}/{branch}/.stats.yml | every 6 hours |
| 3 | SDK CHANGELOG.md | https://raw.githubusercontent.com/{org}/{repo}/{branch}/CHANGELOG.md | daily, 04:00 UTC |
| 4 | GitHub repo metadata (stars, default_branch) | https://api.github.com/repos/{org}/{repo} | daily, 04:00 UTC |

Weekly snapshot of all 12 cells is persisted to metrics_snapshot every Monday at 06:00 UTC (powers the sparkline).

Requests send User-Agent: stainless-watch/1.0 (+https://holyai.me/stainless-watch). If GITHUB_TOKEN is set in the environment, an Authorization: Bearer … header is added so we get the 5000 req/h limit instead of the 60 req/h unauth limit. Token is optional.

Failure handling

No mocks. If a fetch fails, the cell keeps its last known value and stale_since is set. After 24h of continuous failures the row is rendered with a stale badge. If a cell has never successfully fetched, it shows . All failures land in the fetch log surfaced at /api/health/diagnostics.

API

All routes are mounted under /stainless-watch.

| Method | Path | Purpose |
|---|---|---|
| GET | /health | { "ok": true } HTTP 200 — liveness |
| GET | /api/health/diagnostics | Last successful fetch per source + recent errors |
| GET | /api/leaderboard | Full scorecard — array of 12 cells |
| GET | /api/sdk/:provider/:language | Per-SDK detail: latest 30 releases, 30 changelog entries, snapshot history |
| GET | /api/changelog/:provider/:language?limit=N | Parsed CHANGELOG entries (default 50, max 200) |
| GET | /api/history?provider=&language= | Weekly score snapshots |
| GET | /api/refresh/:source | Manually trigger a scraper — releases, stats, changelog, repo-meta, or all. Rate-limited to 1/min per IP |
| GET | /badge/:provider/:language.svg | Shield-style SVG badge. Cache-Control: public, max-age=600 |
| GET | / | SPA shell |
| GET | /assets/* | Static files |

Example badge

https://holyai.me/stainless-watch/badge/openai/python.svg

Markdown:

[![openai-python](https://holyai.me/stainless-watch/badge/openai/python.svg)](https://holyai.me/stainless-watch/#/sdk/openai/python)

Color: red <40, yellow 40–70, green 70+.

Run locally

cp .env.example .env       # optional: set GITHUB_TOKEN to avoid 60 req/h limit
npm install
node server.js             # listens on PORT (default 4821) at /stainless-watch

Then:

On first boot the scheduler kicks an immediate fetch of all four sources so the leaderboard has real numbers within ~30 seconds. After that, the cron schedule above takes over.

Stack

Scope cuts (deliberate)