Timezone: »
We consider empirical risk minimization for large-scale datasets. We introduce Ada Newton as an adaptive algorithm that uses Newton's method with adaptive sample sizes. The main idea of Ada Newton is to increase the size of the training set by a factor larger than one in a way that the minimization variable for the current training set is in the local neighborhood of the optimal argument of the next training set. This allows to exploit the quadratic convergence property of Newton's method and reach the statistical accuracy of each training set with only one iteration of Newton's method. We show theoretically that we can iteratively increase the sample size while applying single Newton iterations without line search and staying within the statistical accuracy of the regularized empirical risk. In particular, we can double the size of the training set in each iteration when the number of samples is sufficiently large. Numerical experiments on various datasets confirm the possibility of increasing the sample size by factor 2 at each iteration which implies that Ada Newton achieves the statistical accuracy of the full training set with about two passes over the dataset.
Author Information
Aryan Mokhtari (University of Pennsylvania)
Hadi Daneshmand Daneshmand (ETH Zurich)
Aurelien Lucchi (ETH Zurich)
Thomas Hofmann (ETH Zurich)
Alejandro Ribeiro (University of Pennsylvania)
More from the Same Authors
-
2020 Poster: Sinkhorn Natural Gradient for Generative Models »
Zebang Shen · Zhenfu Wang · Alejandro Ribeiro · Hamed Hassani -
2020 Poster: Sinkhorn Barycenter via Functional Gradient Descent »
Zebang Shen · Zhenfu Wang · Alejandro Ribeiro · Hamed Hassani -
2020 Spotlight: Sinkhorn Natural Gradient for Generative Models »
Zebang Shen · Zhenfu Wang · Alejandro Ribeiro · Hamed Hassani -
2020 Poster: Graphon Neural Networks and the Transferability of Graph Neural Networks »
Luana Ruiz · Luiz Chamon · Alejandro Ribeiro -
2020 Poster: Batch normalization provably avoids ranks collapse for randomly initialised deep networks »
Hadi Daneshmand · Jonas Kohler · Francis Bach · Thomas Hofmann · Aurelien Lucchi -
2020 Poster: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Spotlight: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Poster: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2020 Poster: Probably Approximately Correct Constrained Learning »
Luiz Chamon · Alejandro Ribeiro -
2020 Oral: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2019 Poster: Shadowing Properties of Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: Continuous-time Models for Stochastic Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: A Domain Agnostic Measure for Monitoring and Evaluating GANs »
Paulina Grnarova · Kfir Y. Levy · Aurelien Lucchi · Nathanael Perraudin · Ian Goodfellow · Thomas Hofmann · Andreas Krause -
2019 Poster: Constrained Reinforcement Learning Has Zero Duality Gap »
Santiago Paternain · Luiz Chamon · Miguel Calvo-Fullana · Alejandro Ribeiro -
2019 Poster: Stability of Graph Scattering Transforms »
Fernando Gama · Alejandro Ribeiro · Joan Bruna -
2018 Poster: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Spotlight: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Poster: Deep State Space Models for Unconditional Word Generation »
Florian Schmidt · Thomas Hofmann -
2017 Poster: Approximate Supermodularity Bounds for Experimental Design »
Luiz Chamon · Alejandro Ribeiro -
2017 Poster: Stabilizing Training of Generative Adversarial Networks through Regularization »
Kevin Roth · Aurelien Lucchi · Sebastian Nowozin · Thomas Hofmann -
2017 Poster: First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization »
Aryan Mokhtari · Alejandro Ribeiro -
2015 Poster: Variance Reduced Stochastic Gradient Descent with Neighbors »
Thomas Hofmann · Aurelien Lucchi · Simon Lacoste-Julien · Brian McWilliams -
2014 Poster: Communication-Efficient Distributed Dual Coordinate Ascent »
Martin Jaggi · Virginia Smith · Martin Takac · Jonathan Terhorst · Sanjay Krishnan · Thomas Hofmann · Michael Jordan