tps-board
Live, measured tokens-per-second + time-to-first-token leaderboard for every major frontier LLM and inference provider on OpenRouter. Refreshed every 15 minutes. No mocks.
tps-board fires a small probe completion at each tracked model on OpenRouter on a cron, then calls GET /api/v1/generation to fetch the authoritative measured stats (tokens_per_second, latency, time_to_first_token, provider_name). Snapshots are persisted to SQLite. The UI shows a sortable leaderboard, per-model history charts, recent probes and recent failures. Per-model SVG badges are exposed for README embeds.
This is the measured-speed axis of LLM intelligence. Sibling Cowork products track price (router-arena), catalogue drift (provider-drift), and free-tier availability (free-llm-radar).
---
Real data sources (no mocks, no seeds, no Math.random())
| Source | URL | Auth | Refresh | Used for |
|---|---|---|---|---|
| OpenRouter chat completions | POST https://openrouter.ai/api/v1/chat/completions | Bearer OPENROUTER_API_KEY | 15 min (free tier), 30 min (paid tier) | Fires a 1-token probe per model, captures the response id. |
| OpenRouter generation stats | GET https://openrouter.ai/api/v1/generation?id=<id> | Bearer OPENROUTER_API_KEY | once per probe (1.3 s after completion) | Pulls authoritative tokens_per_second, latency, time_to_first_token, provider_name. |
| OpenRouter model catalogue | GET https://openrouter.ai/api/v1/models | none | every 6 h | Refreshes context_length and auto-enrols newly-published :free models from known providers. |
If OPENROUTER_API_KEY is missing or set to the placeholder __INJECT_FROM_VAULT__, the server starts in read-only mode: the cron is a no-op and the UI serves whatever historical snapshots are in SQLite. The UI never shows fabricated numbers.
---
Quick start
git clone https://github.com/holyai/tps-board
cd tps-board
cp .env.example .env # then set OPENROUTER_API_KEY
npm install
npm start
# → tps-board listening on :4883/tps-board
Open <http://localhost:4883/tps-board/>.
---
Configuration
Edit config/tracked_models.json to add/remove models. Each entry:
{ "slug": "anthropic/claude-3.5-sonnet", "label": "Claude 3.5 Sonnet", "tier": "paid", "cadence_minutes": 30 }
Tiers:
- free — probed every 15 min, ~$0 cost on OpenRouter's free tier.
- paid — probed every 30 min, < $0.50/day total at the default seed list.
---
HTTP endpoints (all public — no auth)
| Method | Path | What |
|---|---|---|
| GET | /tps-board/ | Web UI (SPA). |
| GET | /tps-board/health | { ok, models, snapshots, snapshots_today, last_snapshot_at, last_cron_finished, live }. |
| GET | /tps-board/api/models | All tracked models + latest snapshot + 24h aggregate. |
| GET | /tps-board/api/leaderboard?window=1h\|24h\|7d&metric=tps\|ttft | Sorted leaderboard. |
| GET | /tps-board/api/snapshots?slug=<slug>&window=24h | Time-series for chart. |
| GET | /tps-board/api/recent?limit=50 | Most recent snapshots, all models. |
| GET | /tps-board/api/failures?limit=50 | Most recent failure rows (real, not fabricated). |
| GET | /tps-board/api/stats | Counters, fastest right now, lowest TTFT right now. |
| GET | /tps-board/api/cron-runs?limit=20 | Recent cron run log. |
| GET | /tps-board/api/badge?slug=<slug>&metric=tps\|ttft | SVG badge (README-friendly). |
| POST | /tps-board/api/probe | Trigger an immediate probe cycle for every enabled model. Serialised. |
Every endpoint, including POST /probe, is open. There is no Basic Auth, no admin password, no API key check. Arda needs to inspect everything instantly. To prevent abuse, POST /probe uses a simple in-process lock — only one cycle can be in flight at a time.
---
Embeddable badge
https://holyai.me/tps-board/api/badge?slug=meta-llama/llama-3.3-70b-instruct:free&metric=tps
Renders something like meta-llama/llama-3.3-70b-instruct:free · 287 tok/s.
---
Cron jobs
| Job | Cron | Action |
|---|---|---|
| probe-free | /15 | Probe every tier=free model. |
| probe-paid | /30 | Probe every tier=paid model. |
| refresh-catalogue | 0 /6 | Pull /v1/models, refresh context_length, auto-enrol new :free slugs from known providers. |
| prune | 15 4 * | Drop snapshots > 30 d, failures > 14 d, cron runs > 7 d. |
Every job logs a cron_runs row visible at /api/cron-runs.
---
Schema
models (slug PK, label, tier, cadence_minutes, context_length, enabled, first_seen_at, updated_at)
tps_snapshots (id, slug, provider_name, tokens_per_second, time_to_first_token, latency, tokens_completion, total_cost, generation_id, observed_at)
tps_failures (id, slug, stage, error_code, error_message, observed_at)
cron_runs (id, job, started_at, finished_at, ok_count, fail_count, notes)
Indexed on (slug, observed_at DESC) and observed_at DESC for fast leaderboard aggregation.
---
Deployment
Designed for the RNDLAB orchestrator pipeline:
- Builds with
npm install --omit=dev.better-sqlite3ships prebuilt arm64 binaries; the sandbox cannot compile native bindings (no nodejs.org access), but the deploy target (Mac mini or arm64 Linux) does. - Runs under systemd as the unit named in
DEPLOY_MANIFEST.json. - Nginx upstream
http://127.0.0.1:4883at route/tps-board. - Health probe at
/tps-board/health.
---
Why this exists
By May 2026 there are ~300 models on OpenRouter served across ~50+ inference providers. The same model can serve at 30 tok/s on a long-tail provider and 2,000+ tok/s on Cerebras. Production teams pick the wrong provider every day because OpenRouter only shows latency in its UI for "right now" — no history, no leaderboard, no metric-axis on throughput. tps-board is the missing public ticker.
License
MIT