← back to gallery

needle-board

Sortable leaderboard of tiny (<1B) function-calling models with live BFCL scores

aileaderboardfunction-callinghuggingfacebfcledge-aion-devicetool-use
Open product ↗

needle-board

The public leaderboard of sub-1B function-calling models — the slice
of the LLM universe that fits on phones, smart glasses, and background
agents. Berkeley's BFCL is dominated by frontier models; needle-board
filters down to the tiny models that builders shipping on-device AI
actually evaluate.

Every score is fetched live from a real source. No mocks. No synthesized
numbers. If a model has no published BFCL score and no self-reported
score in its model card, the BFCL column renders — not a guess.

What it does

Data sources

| Source | URL | Frequency |
|---|---|---|
| HuggingFace model metadata | GET https://huggingface.co/api/models/{repo_id}?full=true | every 6 h per model |
| HuggingFace README | GET https://huggingface.co/{repo_id}/raw/main/README.md (fallback to /master/) | once per HF refresh |
| Berkeley BFCL leaderboard | GET https://gorilla.cs.berkeley.edu/data_overall.csv (canonical CSV the leaderboard HTML page fetches at runtime) | daily 03:00 UTC |
| HF discovery search (log only) | GET https://huggingface.co/api/models?search=function-calling&limit=100 | every 24 h (does not auto-add to watchlist) |

The watchlist of which repos to fetch is hardcoded in
lib/watchlist.js. Everything about each repo is live.

API

All endpoints are mounted at /needle-board and require no auth.

| Method | Path | Returns |
|---|---|---|
| GET | /needle-board/health | { ok: true, service: "needle-board" } |
| GET | /needle-board/api/models | Array of every tracked model. Query params: sort (bfcl_overall, params_m, downloads, likes, size_mb_q4), dir (asc/desc), on_device=1, has_bfcl=1, license=<slug>, max_params=<int>, search=<substring> |
| GET | /needle-board/api/models/:id | Single model with full detail + matched raw BFCL row. :id is the repo id with / replaced by --. |
| GET | /needle-board/api/stats | { tracked_count, with_bfcl, on_device_count, top_by_bfcl, top_by_downloads, last_hf_refresh, last_bfcl_refresh } |
| GET | /needle-board/api/bfcl-raw | Most recent BFCL snapshot rows (transparency endpoint). |
| GET | /needle-board/api/fetch-log?limit=N | Last N fetch attempts with {source, target, status, message, duration_ms, fetched_at}. |
| GET | /needle-board/api/licenses | License → model count groups, for the filter dropdown. |
| POST | /needle-board/api/refresh | Triggers an async re-scrape. Returns { queued: true } immediately or { already_running: true }. Idempotent. Optional ?mode=hf or ?mode=bfcl to refresh just one half. |
| GET | /needle-board/card/:id | Standalone shareable HTML card (no nav, screenshot-ready, OG tags set). |
| GET | /needle-board/ | The SPA. |

How the BFCL ↔ HuggingFace join works

BFCL row names like "Hammer2.1-0.5B-Instruct (FC)" are normalized by
stripping parens, lowercasing, and removing non-alphanumeric characters
(hammer2105binstruct). Two candidate variants are tried per side: the
raw normalized form and the form with trailing suffixes like
-instruct, -fc, -it, -chat, -base stripped. If any candidate
intersects between a model's set and a row's set, they match. The
official leaderboard always beats a self-reported number.

This is intentionally a conservative fuzzy match — false positives are
worse than missing matches, since a wrong BFCL score is more harmful
than a .

Running locally

npm install
PORT=4766 node server.js
# open http://localhost:4766/needle-board/

On first boot, when the database is empty, the server kicks off a full
refresh in the background. It takes ~30 s to pull all HuggingFace
metadata (4 parallel fetches with 250 ms jitter between batches) and
~1 s to pull the BFCL CSV. After that, refreshes are driven by
node-cron on the schedule above, or by a manual POST /api/refresh.

Set SKIP_CRON=1 to disable the in-process cron (useful for tests).
Set DB_PATH=/some/path.db to override the SQLite location.

Stack

What's deliberately out of scope (v1)

Files

needle-board/
├── server.js               Express bootstrap, mounts /needle-board, runs first refresh
├── db.js                   better-sqlite3 + WAL + idempotent schema + fetch_log helper
├── cron.js                 node-cron schedules (HF every 6h, BFCL daily 03:00 UTC)
├── routes/
│   ├── health.js           GET /health
│   ├── models.js           /api/models, /api/models/:id, /api/stats, /api/bfcl-raw, /api/fetch-log, /api/licenses
│   ├── refresh.js          POST /api/refresh
│   └── card.js             GET /card/:id — shareable card HTML
├── scrapers/
│   ├── huggingface.js      Per-model metadata + README fetch
│   ├── bfcl.js             leaderboard CSV (with HTML cheerio fallback)
│   └── index.js            Orchestrator, concurrency, retry, DB upsert
├── lib/
│   ├── watchlist.js        Curated repo ID list
│   ├── normalize.js        Name normalization for cross-referencing
│   ├── benchmarks.js       README regex extraction
│   └── derive.js           q4 size + on_device flag + quant tag parsing
└── public/                 Vanilla JS SPA (index.html, app.js, style.css, card.css)