Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)
MoXCo:How I learned to stop exploring and love my local minima?
Esha Singh · Shoham Sabach · Yu-Xiang Wang
Abstract:
Deep Neural Networks (DNNs) are well-known for their generalization capabilities despite overparameterization. This is commonly attributed to the optimizer’s ability to find “good” solutions within high-dimensional loss landscapes. However, widely employed adaptive optimizers, such as ADAM, may suffer from subpar generalization. This paper presents an innovative methodology, MoXCo, to address these concerns by designing adaptive optimizers that not only expedite exploration with faster convergence speeds but also ensure the avoidance of over-exploitation in specific parameter regimes, ultimately leading to convergence to good solutions.
Chat is not available.