Timezone: »
Poster
Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
Michal Derezinski · Burak Bartan · Mert Pilanci · Michael W Mahoney
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
However, the local estimates on each machine are typically biased, relative to the full solution on all of the data, and this can limit the effectiveness of averaging.
Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods. Our technique has two novel components: (1) modifying standard sketching techniques to obtain what we call a surrogate sketch; and (2) carefully scaling the global regularization parameter for local computations. Our surrogate sketches are based on determinantal point processes, a family of distributions for which the bias of an estimate of the inverse Hessian can be computed exactly. Based on this computation, we show that when the objective being minimized is $l_2$-regularized with parameter $\lambda$ and individual machines are each given a sketch of size $m$, then to eliminate the bias, local estimates should be computed using a shrunk regularization parameter given by $\lambda^\prime=\lambda(1-\frac{d_\lambda}{m})$, where $d_\lambda$ is the $\lambda$-effective
dimension of the Hessian (or, for quadratic problems, the data matrix).
Author Information
Michal Derezinski (UC Berkeley)
Burak Bartan (Stanford University)
Mert Pilanci (Stanford)
Michael W Mahoney (UC Berkeley)
More from the Same Authors
-
2020 Poster: Boundary thickness and robustness in learning models »
Yaoqing Yang · Rajiv Khanna · Yaodong Yu · Amir Gholami · Kurt Keutzer · Joseph Gonzalez · Kannan Ramchandran · Michael W Mahoney -
2020 Poster: Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization »
Jonathan Lacotte · Mert Pilanci -
2020 Oral: Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization »
Jonathan Lacotte · Mert Pilanci -
2020 Poster: Sampling from a k-DPP without looking at all items »
Daniele Calandriello · Michal Derezinski · Michal Valko -
2020 Spotlight: Sampling from a k-DPP without looking at all items »
Daniele Calandriello · Michal Derezinski · Michal Valko -
2020 Poster: Exact expressions for double descent and implicit regularization via surrogate random design »
Michal Derezinski · Feynman Liang · Michael W Mahoney -
2020 Poster: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael W Mahoney -
2020 Poster: Precise expressions for random projections: Low-rank approximation and randomized Newton »
Michal Derezinski · Feynman Liang · Zhenyu Liao · Michael W Mahoney -
2020 Poster: Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform »
Jonathan Lacotte · Sifan Liu · Edgar Dobriban · Mert Pilanci -
2020 Oral: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael W Mahoney -
2020 Poster: A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent »
Zhenyu Liao · Romain Couillet · Michael W Mahoney -
2020 Poster: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks »
Jianfei Chen · Yu Gai · Zhewei Yao · Michael W Mahoney · Joseph Gonzalez -
2019 Workshop: Beyond first order methods in machine learning systems »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael W Mahoney -
2019 Poster: ANODEV2: A Coupled Neural ODE Framework »
Tianjun Zhang · Zhewei Yao · Amir Gholami · Joseph Gonzalez · Kurt Keutzer · Michael W Mahoney · George Biros -
2019 Poster: Distributed estimation of the inverse Hessian by determinantal averaging »
Michal Derezinski · Michael W Mahoney -
2019 Poster: Exact sampling of determinantal point processes with sublinear time preprocessing »
Michal Derezinski · Daniele Calandriello · Michal Valko -
2019 Poster: High-Dimensional Optimization in Adaptive Random Subspaces »
Jonathan Lacotte · Mert Pilanci · Marco Pavone -
2018 Poster: GIANT: Globally Improved Approximate Newton Method for Distributed Optimization »
Shusen Wang · Fred Roosta · Peng Xu · Michael W Mahoney -
2018 Poster: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu -
2018 Spotlight: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu -
2018 Poster: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries »
Zhewei Yao · Amir Gholami · Qi Lei · Kurt Keutzer · Michael W Mahoney -
2017 Poster: Unbiased estimates for linear regression via volume sampling »
Michal Derezinski · Manfred K. Warmuth -
2017 Spotlight: Unbiased estimates for linear regression via volume sampling »
Michal Derezinski · Manfred K. Warmuth -
2016 Poster: Feature-distributed sparse regression: a screen-and-clean approach »
Jiyan Yang · Michael W Mahoney · Michael Saunders · Yuekai Sun -
2016 Poster: Sub-sampled Newton Methods with Non-uniform Sampling »
Peng Xu · Jiyan Yang · Farbod Roosta-Khorasani · Christopher RĂ© · Michael W Mahoney -
2015 Poster: Fast Randomized Kernel Ridge Regression with Statistical Guarantees »
Ahmed Alaoui · Michael W Mahoney -
2014 Poster: The limits of squared Euclidean distance regularization »
Michal Derezinski · Manfred K. Warmuth -
2014 Spotlight: The limits of squared Euclidean distance regularization »
Michal Derezinski · Manfred K. Warmuth -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht