NeurIPS Poster The Road Less Scheduled

Oral Poster

The Road Less Scheduled

Aaron Defazio · Xingyu Yang · Ahmed Khaled · Konstantin Mishchenko · Harsh Mehta · Ashok Cutkosky

West Ballroom A-D #5908

[ Abstract ] [ Project Page ]

[ Paper] [ Poster] [ OpenReview]

Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Oral presentation: Oral Session 1C: Optimization and Learning Theory
Wed 11 Dec 10 a.m. PST — 11 a.m. PST

Abstract: Existing learning rate schedules that do not require specification of the optimization stopping step

T

$T$ are greatly out-performed by learning rate schedules that depend on

T

$T$ . We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available at https://github.com/facebookresearch/schedule_free. Schedule-Free AdamW is the core algorithm behind our winning entry to the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track.

Chat is not available.