About · agent-horizon

What this is

agent-horizon is a small re-presentation layer on top of METR's Time Horizon 1.1 benchmark — the most-cited multi-step agent reliability measurement of 2026. We add three things METR's own page does not surface directly:

a sortable per-vendor leaderboard with confidence intervals;
a reliability-decay calculator that compounds a model's per-step accuracy across multi-step workflows; and
a movers feed showing which models had their horizon estimate revised since last week.

How METR measures time horizon

METR runs each frontier AI agent against a fixed suite of software-engineering, ML, and cybersecurity tasks. For every task, an estimated human-expert completion time exists. METR fits a logistic curve to the agent's success probability versus task duration; the 50%-time horizon is the duration at which the fit crosses 50% success; the 80%-time horizon is the duration at which it crosses 80%. Read the original paper at arXiv:2503.14499 or the Time Horizon 1.1 release notes.

agent-horizon does not re-run any benchmark. We only re-read METR's published YAML feeds and render them.

Data sources

Source	URL	Refresh
METR benchmark results	benchmark_results_1_1.yaml	every 6 hours
METR task results	task_results_1_1.yaml	every 24 hours

How the decay calculator works

A multi-step agent task is treated as N independent steps each succeeding with probability p. End-to-end success is therefore p^N. This is a deliberate simplification — real agents can recover from some failures with retries or self-reflection — but it's the same back-of-the-envelope math practitioners use to set workflow length budgets.

Unless you override it, we use a model's METR average_score as the per-step accuracy. METR's average_score is the mean success rate across the entire task suite, weighted by task duration. It is the cleanest publicly available proxy for per-step reliability.

Service health

Live status from /agent-horizon/health.

loading…

Disclaimer

Numbers shown are METR's measurements, re-presented for convenience. agent-horizon is not affiliated with METR. If anything looks off, the source of truth is always metr.org/time-horizons.