Timezone: »
Poster
Exponential Graph is Provably Efficient for Decentralized Deep Training
Bicheng Ying · Kun Yuan · Yiming Chen · Hanbin Hu · PAN PAN · Wotao Yin
Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging using little communication. This requires a skillful choice of communication topology, which is an under-studied topic in decentralized optimization.In this paper, we study so-called exponential graphs where every node is connected to $O(\log(n))$ neighbors and $n$ is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of $\log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging. This favorable property enables one-peer exponential graph to average as effective as its static counterpart but communicates more efficiently. We apply these exponential graphs in decentralized (momentum) SGD to obtain the state-of-the-art balance between per-iteration communication and iteration complexity among all commonly-used topologies. Experimental results on a variety of tasks and models demonstrate that decentralized (momentum) SGD over exponential graphs promises both fast and high-quality training. Our code is implemented through BlueFog and available at https://github.com/Bluefog-Lib/NeurIPS2021-Exponential-Graph.
Author Information
Bicheng Ying (Google Inc.)
Kun Yuan (Alibaba Group)
Yiming Chen (Alibaba Group)
Hanbin Hu (University of California, Santa Barbara)
PAN PAN (Alibaba Group)
Wotao Yin (Alibaba US, DAMO Academy)
More from the Same Authors
-
2021 Spotlight: Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems »
Tianyi Chen · Yuejiao Sun · Wotao Yin -
2021 : Practice-Consistent Analysis of Adam-Style Methods »
Zhishuai Guo · Yi Xu · Wotao Yin · Rong Jin · Tianbao Yang -
2022 Poster: Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization »
Kun Yuan · Xinmeng Huang · Yiming Chen · Xiaohan Zhang · Yingya Zhang · Pan Pan -
2022 Poster: Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression »
Xinmeng Huang · Yiming Chen · Wotao Yin · Kun Yuan -
2021 Poster: Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems »
Tianyi Chen · Yuejiao Sun · Wotao Yin -
2021 Poster: Hyperparameter Tuning is All You Need for LISTA »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2021 Poster: Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection »
HanQin Cai · Jialin Liu · Wotao Yin -
2021 Poster: An Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders »
Xinmeng Huang · Kun Yuan · Xianghui Mao · Wotao Yin -
2020 Poster: An Improved Analysis of Stochastic Gradient Descent with Momentum »
Yanli Liu · Yuan Gao · Wotao Yin -
2020 Poster: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods »
Yanli Liu · Kaiqing Zhang · Tamer Basar · Wotao Yin -
2020 Poster: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2020 Spotlight: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang