hallu-board

Live hallucination leaderboard — mirrors the Vectara HHEM benchmark with sortable UI, diff-over-time, and a biggest-movers feed. Answer "which LLM hallucinates least right now?" in one glance.

Why

Every model release in May 2026 ships with a hallucination number (GPT-5.5 Instant
shaved 52.5% off pro-domain hallucinations; Anthropic released NLA for
interpretability). The canonical public benchmark is Vectara's HHEM
(Hughes Hallucination Evaluation Model) leaderboard, but it lives in a single
GitHub README — no sort, no filter, no provider grouping, no time series.

hallu-board mirrors HHEM every 6 hours and adds:

Sortable, filterable, searchable leaderboard.
Provider grouping (inferred from model name).
Biggest-movers diff vs the previous snapshot.
Per-model time series across every refresh we've captured.
All endpoints public — no auth, no login, instant inspection.

Data sources (live, no mock data)

| Source | URL | Refresh |
| --- | --- | --- |
| Vectara HHEM CSV | https://raw.githubusercontent.com/vectara/hallucination-leaderboard/main/leaderboard_summaries.csv | every 6h |
| Vectara HHEM README (fallback) | https://raw.githubusercontent.com/vectara/hallucination-leaderboard/main/README.md | every 6h |
| Repo HEAD commit | https://api.github.com/repos/vectara/hallucination-leaderboard/commits/main | every 6h |

CSV is preferred; if the CSV is missing or unparseable the fetcher falls back
to parsing the markdown table inside the README. If both fail in a cycle the
previous snapshot remains visible and the UI shows a "stale" badge. No seed
data, no mocks, no Math.random() anywhere.

Endpoints

| Method | Path | Purpose |
| --- | --- | --- |
| GET | /hallu-board/health | service health |
| GET | /hallu-board/api/models | current leaderboard (?provider=&q=&sort=&order=) |
| GET | /hallu-board/api/history/:model| time-series for a single model |
| GET | /hallu-board/api/movers | diff vs the previous snapshot |
| GET | /hallu-board/api/stats | aggregate stats (median, best/worst, by provider) |
| GET | /hallu-board/api/snapshots | snapshot index |
| POST | /hallu-board/api/refresh | force a refetch (no auth) |
| GET | /hallu-board/ | SPA |

Stack

Node 18+, Express 4, better-sqlite3 (WAL), node-cron, helmet, compression,
morgan. Vanilla-JS dark-theme SPA. No frontend framework.

Run locally

cp .env.example .env
npm install
npm start
# open http://localhost:4781/hallu-board/

The first fetch happens on boot if the DB is empty. Subsequent fetches happen
on a cron (0 /6 ). You can force one any time with:

curl -X POST http://localhost:4781/hallu-board/api/refresh

Schedule

| Cron | Job |
| --- | --- |
| 0 /6 | Refetch HHEM, write new snapshot if changed |
| 5 0 * | Prune fetch_log >30d, VACUUM the DB |

Layout

hallu-board/
  server.js                Express app, helmet, cron, boot
  db.js                    better-sqlite3 + schema + prepared statements
  fetchers/
    vectara.js             HHEM CSV + README parser, persist snapshot
    providers.js           model-name → provider mapping
  routes/
    api.js                 JSON API
  public/
    index.html  app.js  style.css     vanilla SPA, dark theme
  data/                    sqlite DB (gitignored)

License & attribution

Source data: Vectara HHEM leaderboard (Apache-2.0).
hallu-board is an unaffiliated mirror.