| # | Model | Provider | Hallu % | FCR % | Answer % | Words |
|---|
No matches.
Biggest movers since the previous snapshot
Improved (lower hallu %)
Regressed (higher hallu %)
New & removed
By provider
Mean hallucination rate and best model per provider (lower is better).
Model trend
Pick a model to plot its hallucination rate across snapshots.
Hallucination %
Factual Consistency %
Answer %
About hallu-board
hallu-board mirrors the Vectara HHEM (Hughes Hallucination Evaluation Model) leaderboard every 6 hours. HHEM measures how often an LLM introduces facts not present in a source document when asked to summarize — the closest thing the field has to a standard hallucination metric.
What you get here that the README doesn't
- Live sorting, filtering, search.
- Provider grouping (we infer provider from model name).
- "Biggest movers" diff vs the previous snapshot.
- Per-model time series across every refresh we've captured.
Data sources & refresh
| Source | URL | Refresh |
|---|---|---|
| Vectara HHEM CSV | raw.githubusercontent.com/vectara/hallucination-leaderboard/main/leaderboard_summaries.csv |
every 6h |
| Vectara HHEM README (fallback) | raw.githubusercontent.com/vectara/hallucination-leaderboard/main/README.md |
every 6h |
| Repo commit SHA | api.github.com/repos/vectara/hallucination-leaderboard/commits/main |
every 6h |
All data is live; nothing here is seeded, mocked, or randomized. If both upstream sources fail in a cycle, the prior snapshot stays visible and this page shows a "stale" badge.