Timezone: »

Training Stronger Baselines for Learning to Optimize
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang

Wed Dec 09 08:00 AM -- 08:10 AM (PST) @ Orals & Spotlights: Kernel Methods/Optimization

Learning to optimize (L2O) is gaining increased attention because classical optimizers require laborious, problem-specific design and hyperparameter tuning. However, there are significant performance and practicality gaps between manually designed optimizers and existing L2O models. Specifically, learned optimizers are applicable to only a limited class of problems, often exhibit instability, and generalize poorly. As research efforts focus on increasingly sophisticated L2O models, we argue for an orthogonal, under-explored theme: improved training techniques for L2O models. We first present a progressive, curriculum-based training scheme, which gradually increases the optimizer unroll length to mitigate the well-known L2O dilemma of truncation bias (shorter unrolling) versus gradient explosion (longer unrolling). Secondly, we present an off-policy imitation learning based approach to guide the L2O learning, by learning from the behavior of analytical optimizers. We evaluate our improved training techniques with a variety of state-of-the-art L2O models and immediately boost their performance, without making any change to their model structures. We demonstrate that, using our improved training techniques, one of the earliest and simplest L2O models can be trained to outperform even the latest and most complex L2O models on a number of tasks. Our results demonstrate a greater potential of L2O yet to be unleashed, and prompt a reconsideration of recent L2O model progress. Our codes are publicly available at: https://github.com/VITA-Group/L2O-Training-Techniques.

Author Information

Tianlong Chen (Unversity of Texas at Austin)
Weiyi Zhang (Shanghai Jiao Tong University)
Zhou Jingyang (University of Science and Technology of China)
Shiyu Chang (MIT-IBM Watson AI Lab)
Sijia Liu (Michigan State University)
Lisa Amini (IBM Research)
Zhangyang Wang (University of Texas at Austin)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors