Towards Robust Unroll Generalization in Learned Optimizers
Abstract
Recent works have demonstrated that learned optimizers (LOs) can be competitive and at timesoutperform hand-designed counterparts, paving a path towards improved optimizers by scaling upLOs. However, learned optimizers still require substantial meta-learning compute, which limitstheir scalability, requiring new methods that allow them to generalize to a wider array of problemsfrom a smaller meta-learning problems. One aspect of this is the training horizon mismatch betweenmeta-learning and real world training. We consider the problem of efficiently meta-learning LOsthat can generalize to long training time horizons. We propose LoLO, which employs a replaybuffer to efficiently extend unroll length during meta-training without increasing meta-learningcost. Furthermore, it incorporates on-policy imitation learning to ensure faithful trajectories andstabilize meta-training. We evaluate LoLO on a variety of vision and language tasks, demonstratingits success in achieving long unroll generalization in practical scenarios.