What this is
agent-horizon is a small re-presentation layer on top of METR's Time Horizon 1.1 benchmark — the most-cited multi-step agent reliability measurement of 2026. We add three things METR's own page does not surface directly:
- a sortable per-vendor leaderboard with confidence intervals;
- a reliability-decay calculator that compounds a model's per-step accuracy across multi-step workflows; and
- a movers feed showing which models had their horizon estimate revised since last week.
How METR measures time horizon
METR runs each frontier AI agent against a fixed suite of software-engineering, ML, and cybersecurity tasks. For every task, an estimated human-expert completion time exists. METR fits a logistic curve to the agent's success probability versus task duration; the 50%-time horizon is the duration at which the fit crosses 50% success; the 80%-time horizon is the duration at which it crosses 80%. Read the original paper at arXiv:2503.14499 or the Time Horizon 1.1 release notes.
agent-horizon does not re-run any benchmark. We only re-read METR's published YAML feeds and render them.
Data sources
| Source | URL | Refresh |
|---|---|---|
| METR benchmark results | benchmark_results_1_1.yaml | every 6 hours |
| METR task results | task_results_1_1.yaml | every 24 hours |
How the decay calculator works
A multi-step agent task is treated as N independent steps each succeeding with probability p. End-to-end success is therefore pN. This is a deliberate simplification — real agents can recover from some failures with retries or self-reflection — but it's the same back-of-the-envelope math practitioners use to set workflow length budgets.
Unless you override it, we use a model's METR average_score as the per-step accuracy. METR's average_score is the mean success rate across the entire task suite, weighted by task duration. It is the cleanest publicly available proxy for per-step reliability.
Service health
Live status from /agent-horizon/health.
loading…
Disclaimer
Numbers shown are METR's measurements, re-presented for convenience. agent-horizon is not affiliated with METR. If anything looks off, the source of truth is always metr.org/time-horizons.