Skip to yearly menu bar Skip to main content


Poster Wed, Dec 3, 2025 • 4:30 PM – 7:30 PM PST

Measuring AI Ability to Complete Long Software Tasks

Thomas Kwa ⋅ Ben West ⋅ Joel Becker ⋅ Amy Deng ⋅ Katharyn Garcia ⋅ Max Hasin ⋅ Sami Jawhar ⋅ Megan Kinniment ⋅ Nate Rush ⋅ Sydney Von Arx ⋅ Ryan Bloom ⋅ Thomas Broadley ⋅ Haoxing Du ⋅ Brian Goodrich ⋅ Nikola Jurkovic ⋅ Luke Miles ⋅ Seraphina Nix ⋅ Tao Lin ⋅ Neev Parikh ⋅ David Rein ⋅ Lucas Jun Koba Sato ⋅ Hjalmar Wijk ⋅ Daniel Ziegler ⋅ Elizabeth Barnes ⋅ Lawrence Chan

Abstract

Video

Chat is not available.