Timezone: »
Poster
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
Tomoya Murata · Taiji Suzuki
@
In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms escaping saddle points, have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to $\widetilde \Theta(1)$ when the local datasets heterogeneity goes to zero. Numerical results validate our theoretical findings.
Author Information
Tomoya Murata (NTT DATA Mathematical Systems Inc.)
Taiji Suzuki (The University of Tokyo/RIKEN-AIP)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning »
Thu. Dec 8th 01:00 -- 03:00 AM Room
More from the Same Authors
-
2021 Spotlight: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2022 : Reducing Communication in Nonconvex Federated Learning with a Novel Single-Loop Variance Reduction Method »
Kazusato Oko · Shunta Akiyama · Tomoya Murata · Taiji Suzuki -
2022 Spotlight: Lightning Talks 4A-2 »
Barakeel Fanseu Kamhoua · Hualin Zhang · Taiki Miyagawa · Tomoya Murata · Xin Lyu · Yan Dai · Elena Grigorescu · Zhipeng Tu · Lijun Zhang · Taiji Suzuki · Wei Jiang · Haipeng Luo · Lin Zhang · Xi Wang · Young-San Lin · Huan Xiong · Liyu Chen · Bin Gu · Jinfeng Yi · Yongqiang Chen · Sandeep Silwal · Yiguang Hong · Maoyuan Song · Lei Wang · Tianbao Yang · Han Yang · MA Kaili · Samson Zhou · Deming Yuan · Bo Han · Guodong Shi · Bo Li · James Cheng -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime »
Naoki Nishikawa · Taiji Suzuki · Atsushi Nitanda · Denny Wu -
2022 Poster: Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization »
Yuri Kinoshita · Taiji Suzuki -
2021 Poster: Differentiable Multiple Shooting Layers »
Stefano Massaroli · Michael Poli · Sho Sonoda · Taiji Suzuki · Jinkyoo Park · Atsushi Yamashita · Hajime Asama -
2021 Poster: Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis »
Atsushi Nitanda · Denny Wu · Taiji Suzuki -
2021 Poster: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2017 Poster: Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization »
Tomoya Murata · Taiji Suzuki -
2017 Poster: Trimmed Density Ratio Estimation »
Song Liu · Akiko Takeda · Taiji Suzuki · Kenji Fukumizu