hallu-board · which LLM hallucinates least?

#	Model	Provider	Hallu %	FCR %	Answer %	Words

About hallu-board

hallu-board mirrors the Vectara HHEM (Hughes Hallucination Evaluation Model) leaderboard every 6 hours. HHEM measures how often an LLM introduces facts not present in a source document when asked to summarize — the closest thing the field has to a standard hallucination metric.

What you get here that the README doesn't

Live sorting, filtering, search.
Provider grouping (we infer provider from model name).
"Biggest movers" diff vs the previous snapshot.
Per-model time series across every refresh we've captured.

Data sources & refresh

Source	URL	Refresh
Vectara HHEM CSV	`raw.githubusercontent.com/vectara/hallucination-leaderboard/main/leaderboard_summaries.csv`	every 6h
Vectara HHEM README (fallback)	`raw.githubusercontent.com/vectara/hallucination-leaderboard/main/README.md`	every 6h
Repo commit SHA	`api.github.com/repos/vectara/hallucination-leaderboard/commits/main`	every 6h

All data is live; nothing here is seeded, mocked, or randomized. If both upstream sources fail in a cycle, the prior snapshot stays visible and this page shows a "stale" badge.

Biggest movers since the previous snapshot

Improved (lower hallu %)

Regressed (higher hallu %)

New & removed

By provider

Model trend

About hallu-board

What you get here that the README doesn't

Data sources & refresh