agent-horizon

Your agent's per-step accuracy looks great until you multiply it. We do the multiplication for you.

Time horizon vs. release date

Loading METR feed…

Reliability decay calculator

P(N steps succeed) = per-step accuracyN. Choose a model; we use its METR average score as the per-step accuracy unless you override it.

End-to-end P(success)
Steps to 90%
Steps to 50%
Steps to 10%

Movers this week

Models whose p50 horizon changed by ≥5% since 7 days ago, plus new entries and new SOTA badges.

    Leaderboard

    All 16+ models in METR's Time Horizon 1.1 results, sorted by 50%-time horizon. Click a row for full history.

    Model Vendor Released Avg score p50 p80
    Loading METR snapshot…