Atlas – Rethinking Optimizer Design for Stability and Speed
Abstract
Training modern neural networks still relies overwhelmingly on first-order optimisation, despite decades of evidence that second-order information can accelerate convergence and improve final generalisation. The practical barrier is cost: exact curvature is infeasible for large models, and most quasi-second-order methods bleed memory or wall-clock until they fall behind ADAM, let alone SGD. We introduce Atlas, a curvature-aware optimiser that stays small: (i) a Hutch++ low-rank sketch extracts promising curvature directions in O(kd) memory, (ii) a trust-radius clamp prevents runaway steps without tuning, and (iii) a lightweight Safe-Step Control rolls back the rare catastrophic update. On five image-classification benchmarks (MNIST, FASHION-MNIST, SVHN, CIFAR10, CIFAR100) and identical micro-CNNs, Atlas achieves the highest test accuracy on all five tasks, beating the strongest baseline by up to 2.54 % points and the macro mean by 2.4 pp. At the same time it reduces rollback events by an order of magnitude. Atlas therefore delivers second-order quality at first-order cost.