Time horizon vs. release date
Loading METR feed…
Reliability decay calculator
P(N steps succeed) = per-step accuracyN. Choose a model; we use its METR average score as the per-step accuracy unless you override it.
End-to-end P(success)
—
Steps to 90%
—
Steps to 50%
—
Steps to 10%
—
Movers this week
Models whose p50 horizon changed by ≥5% since 7 days ago, plus new entries and new SOTA badges.
Leaderboard
…All 16+ models in METR's Time Horizon 1.1 results, sorted by 50%-time horizon. Click a row for full history.
| Model | Vendor | Released | Avg score | p50 | p80 |
|---|---|---|---|---|---|
| Loading METR snapshot… | |||||