← back to gallery

TPS Board

Live measured tokens-per-second leaderboard across LLM providers, refreshed every 15 minutes

aillmopenrouterthroughputleaderboardbenchmarkinference
Open product ↗

tps-board

Live, measured tokens-per-second + time-to-first-token leaderboard for every major frontier LLM and inference provider on OpenRouter. Refreshed every 15 minutes. No mocks.

tps-board fires a small probe completion at each tracked model on OpenRouter on a cron, then calls GET /api/v1/generation to fetch the authoritative measured stats (tokens_per_second, latency, time_to_first_token, provider_name). Snapshots are persisted to SQLite. The UI shows a sortable leaderboard, per-model history charts, recent probes and recent failures. Per-model SVG badges are exposed for README embeds.

This is the measured-speed axis of LLM intelligence. Sibling Cowork products track price (router-arena), catalogue drift (provider-drift), and free-tier availability (free-llm-radar).

---

Real data sources (no mocks, no seeds, no Math.random())

| Source | URL | Auth | Refresh | Used for |
|---|---|---|---|---|
| OpenRouter chat completions | POST https://openrouter.ai/api/v1/chat/completions | Bearer OPENROUTER_API_KEY | 15 min (free tier), 30 min (paid tier) | Fires a 1-token probe per model, captures the response id. |
| OpenRouter generation stats | GET https://openrouter.ai/api/v1/generation?id=<id> | Bearer OPENROUTER_API_KEY | once per probe (1.3 s after completion) | Pulls authoritative tokens_per_second, latency, time_to_first_token, provider_name. |
| OpenRouter model catalogue | GET https://openrouter.ai/api/v1/models | none | every 6 h | Refreshes context_length and auto-enrols newly-published :free models from known providers. |

If OPENROUTER_API_KEY is missing or set to the placeholder __INJECT_FROM_VAULT__, the server starts in read-only mode: the cron is a no-op and the UI serves whatever historical snapshots are in SQLite. The UI never shows fabricated numbers.

---

Quick start

git clone https://github.com/holyai/tps-board
cd tps-board
cp .env.example .env  # then set OPENROUTER_API_KEY
npm install
npm start
# → tps-board listening on :4883/tps-board

Open <http://localhost:4883/tps-board/>.

---

Configuration

Edit config/tracked_models.json to add/remove models. Each entry:

{ "slug": "anthropic/claude-3.5-sonnet", "label": "Claude 3.5 Sonnet", "tier": "paid", "cadence_minutes": 30 }

Tiers:
- free — probed every 15 min, ~$0 cost on OpenRouter's free tier.
- paid — probed every 30 min, < $0.50/day total at the default seed list.

---

HTTP endpoints (all public — no auth)

| Method | Path | What |
|---|---|---|
| GET | /tps-board/ | Web UI (SPA). |
| GET | /tps-board/health | { ok, models, snapshots, snapshots_today, last_snapshot_at, last_cron_finished, live }. |
| GET | /tps-board/api/models | All tracked models + latest snapshot + 24h aggregate. |
| GET | /tps-board/api/leaderboard?window=1h\|24h\|7d&metric=tps\|ttft | Sorted leaderboard. |
| GET | /tps-board/api/snapshots?slug=<slug>&window=24h | Time-series for chart. |
| GET | /tps-board/api/recent?limit=50 | Most recent snapshots, all models. |
| GET | /tps-board/api/failures?limit=50 | Most recent failure rows (real, not fabricated). |
| GET | /tps-board/api/stats | Counters, fastest right now, lowest TTFT right now. |
| GET | /tps-board/api/cron-runs?limit=20 | Recent cron run log. |
| GET | /tps-board/api/badge?slug=<slug>&metric=tps\|ttft | SVG badge (README-friendly). |
| POST | /tps-board/api/probe | Trigger an immediate probe cycle for every enabled model. Serialised. |

Every endpoint, including POST /probe, is open. There is no Basic Auth, no admin password, no API key check. Arda needs to inspect everything instantly. To prevent abuse, POST /probe uses a simple in-process lock — only one cycle can be in flight at a time.

---

Embeddable badge

https://holyai.me/tps-board/api/badge?slug=meta-llama/llama-3.3-70b-instruct:free&metric=tps

Renders something like meta-llama/llama-3.3-70b-instruct:free · 287 tok/s.

---

Cron jobs

| Job | Cron | Action |
|---|---|---|
| probe-free | /15 | Probe every tier=free model. |
| probe-paid |
/30 | Probe every tier=paid model. |
| refresh-catalogue | 0 /6 | Pull /v1/models, refresh context_length, auto-enrol new :free slugs from known providers. |
| prune | 15 4 * | Drop snapshots > 30 d, failures > 14 d, cron runs > 7 d. |

Every job logs a cron_runs row visible at /api/cron-runs.

---

Schema

models           (slug PK, label, tier, cadence_minutes, context_length, enabled, first_seen_at, updated_at)
tps_snapshots    (id, slug, provider_name, tokens_per_second, time_to_first_token, latency, tokens_completion, total_cost, generation_id, observed_at)
tps_failures     (id, slug, stage, error_code, error_message, observed_at)
cron_runs        (id, job, started_at, finished_at, ok_count, fail_count, notes)

Indexed on (slug, observed_at DESC) and observed_at DESC for fast leaderboard aggregation.

---

Deployment

Designed for the RNDLAB orchestrator pipeline:

---

Why this exists

By May 2026 there are ~300 models on OpenRouter served across ~50+ inference providers. The same model can serve at 30 tok/s on a long-tail provider and 2,000+ tok/s on Cerebras. Production teams pick the wrong provider every day because OpenRouter only shows latency in its UI for "right now" — no history, no leaderboard, no metric-axis on throughput. tps-board is the missing public ticker.

License

MIT