Poster Thu, Dec 4, 2025 • 4:30 PM – 7:30 PM PST Exhibit Hall C,D,E #2514

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

Ori Press · Brandon Amos · Haoyu Zhao · Yikai Wu · Samuel Ainsworth · Dominik Krupke · Patrick Kidger · Touqir Sajed · Bartolomeo Stellato · Jisun Park · Nathanael Bosch · Eli Meril · Albert Steppi · Arman Zharmagambetov · Fangzhao Zhang · David Pérez-Piñeiro · Alberto Mercurio · Ni Zhan · Talor Abramovich · Kilian Lieret · Hanlin Zhang · Shirley Huang · Matthias Bethge · Ofir Press

Project Page [ Poster] [ OpenReview]

Abstract

Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (SWE-Bench) and mathematics (FrontierMath). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 120 tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages.In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models.AlgoTuner achieves an average 1.58x speedup against reference solvers, including methods from packages such as SciPy, scikit-learn and CVXPY.However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.

Video

Chat is not available.