embed-arena

Live MTEB embedding-model leaderboard with cost intelligence. Pick the best embedding for your RAG budget in 30 seconds.

embed-arena pulls every embedding model with public MTEB results from
HuggingFace, computes per-category mean scores, joins with a curated provider-pricing
table, and serves a single-page leaderboard you can filter by license, dimension,
modality, and budget.

Live data, no mocks. Benchmark scores fetched from huggingface.co/datasets/mteb/results.
Cost intelligence. Each closed-source row carries a documented price + source URL + verification date.
Self-host friendly. Single Node.js process, single SQLite file, no external dependencies at runtime.
Public. No login, no signup, no API key.

Quickstart

git clone <repo>
cd embed-arena
cp .env.example .env
npm install
npm start
# open http://localhost:4751/embed-arena/

The first boot performs a small warm-up (~30 models, ~40 MTEB tasks each).
Subsequent boots use the cached SQLite. Cron refreshes run every few hours.

Endpoints

All endpoints are public and unauthenticated.

| Endpoint | Method | Description |
| --- | --- | --- |
| /embed-arena/health | GET | Service status + DB row counts |
| /embed-arena/api/models | GET | Filterable, sortable model list |
| /embed-arena/api/models/:id | GET | Per-model details + per-task scores |
| /embed-arena/api/categories | GET | List of category aggregates |
| /embed-arena/api/stats | GET | Open-vs-closed gap, top model, cheapest, best value |
| /embed-arena/api/refresh-log | GET | Recent refresh activity |
| /embed-arena/api/refresh | POST | Trigger an MTEB refresh (rate-limited per IP, no auth) |
| /embed-arena/api/refresh/:jobId | GET | Poll a refresh job |
| /embed-arena/api/sources | GET | List of every external data source URL we hit |
| /embed-arena/api/admin/sync-providers | POST | Re-sync data/providers.json into the DB |
| /embed-arena/api/export.csv?category=overall_eng | GET | Download current ranking as CSV |

Query params for `/api/models`

category — overall_eng, overall_multi, retrieval, classification, sts, clustering, reranking, code, summarization
license — open or closed
min_dim, max_dim — vector dimension bounds
max_price — USD per million input tokens
min_score — minimum mean MTEB score (0–100 normalized)
modality — text, image, code, audio, video
search — substring match on id / display_name
sort — score (default), value, cheapest, downloads, recent, name
limit — max rows, default 100, cap 500

Data sources

| Source | Purpose | Refresh |
| --- | --- | --- |
| https://huggingface.co/api/datasets/mteb/results/tree/main/results | Discovery: every model with public MTEB evals | Daily 03:00 UTC |
| https://huggingface.co/datasets/mteb/results/resolve/main/results/{model}/{rev}/{task}.json | Real benchmark main_score per task | Daily 05:00 UTC |
| https://huggingface.co/api/models/{owner}/{model} | Live downloads, license, dimensions, parameter count | Every 6 hours |
| data/providers.json (file) | Documented proprietary pricing (OpenAI, Cohere, Voyage, Google, Mistral, Amazon, Jina, NVIDIA) | Weekly Monday 04:00 UTC |

Pricing is never fabricated: each row in data/providers.json carries a
pricing_source_url pointing at the provider's published docs and a
last_verified date. The UI links to the source on every model detail card.
The full source list is also exposed at /embed-arena/api/sources.

Schema

See db.js for the full DDL. The interesting tables:

models — one row per embedding model, joins HF metadata + provider pricing
benchmark_scores — per-task normalized scores from MTEB
category_aggregates — per-model means by category (recomputed after each score refresh)
refresh_log, refresh_jobs — refresh history

How "Value Score" is computed

For models with a known price-per-million-tokens, value_score = mean_overall_eng_score / price_per_million_tokens_usd.
Higher is better (quality per dollar). Self-hosted models display "—" for value
because their cost depends on your hardware, not on a per-token rate.

Adding a new closed-source model

Append to data/providers.json with id, provider, price_per_million_tokens_usd, pricing_source_url, last_verified, plus dimensions and modalities if known.
curl -X POST http://localhost:4751/embed-arena/api/admin/sync-providers

License

MIT. Built with Express, better-sqlite3, node-cron, helmet, compression.

Built by holyai.me — part of the Cowork R&D pipeline.