Timezone: »
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms are expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing compression methods do not scale well to large scale distributed systems (due to gradient build-up) and / or lack evaluations in large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleComp), that (i) leverages similarity in the gradient distribution amongst learners to provide a commutative compressor and keep communication cost constant to worker number and (ii) includes low-pass filter in local gradient accumulations to mitigate the impacts of large batch size training and significantly improve scalability. Using theoretical analysis, we show that ScaleComp provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleComp has small overheads, directly reduces gradient traffic and provides high compression rates (70-150X) and excellent scalability (up to 64-80 learners and 10X larger batch sizes over normal training) across a wide range of applications (image, language, and speech) without significant accuracy loss.
Author Information
Chia-Yu Chen (IBM research)
my research areas focus on: accelerator architecture compiler design and library development machine learning and neural network VLSI and nano device
Jiamin Ni (IBM)
Songtao Lu (IBM Research)
Xiaodong Cui (IBM T. J. Watson Research Center)
Pin-Yu Chen (IBM Research AI)
Xiao Sun (IBM Thomas J. Watson Research Center)
Naigang Wang (IBM T. J. Watson Research Center)
Swagath Venkataramani (IBM Research)
Vijayalakshmi (Viji) Srinivasan (IBM TJ Watson)
Wei Zhang (IBM T.J.Watson Research Center)
BE Beijing Univ of Technology 2005 MSc Technical University of Denmark 2008 PhD University of Wisconsin, Madison 2013 All in computer science Published papers in ASPLOS, OOPSLA, OSDI, PLDI, IJCAI, ICDM, NIPS
Kailash Gopalakrishnan (IBM Research)
More from the Same Authors
-
2020 Poster: A Decentralized Parallel Algorithm for Training Generative Adversarial Nets »
Mingrui Liu · Wei Zhang · Youssef Mroueh · Xiaodong Cui · Jarret Ross · Tianbao Yang · Payel Das -
2020 Poster: Ultra-Low Precision 4-bit Training of Deep Neural Networks »
Xiao Sun · Naigang Wang · Chia-Yu Chen · Jiamin Ni · Ankur Agrawal · Xiaodong Cui · Swagath Venkataramani · Kaoutar El Maghraoui · Vijayalakshmi (Viji) Srinivasan · Kailash Gopalakrishnan -
2020 Poster: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Spotlight: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Oral: Ultra-Low Precision 4-bit Training of Deep Neural Networks »
Xiao Sun · Naigang Wang · Chia-Yu Chen · Jiamin Ni · Ankur Agrawal · Xiaodong Cui · Swagath Venkataramani · Kaoutar El Maghraoui · Vijayalakshmi (Viji) Srinivasan · Kailash Gopalakrishnan -
2020 Poster: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training »
Yonggan Fu · Haoran You · Yang Zhao · Yue Wang · Chaojian Li · Kailash Gopalakrishnan · Zhangyang Wang · Yingyan Lin -
2020 Poster: Higher-Order Certification For Randomized Smoothing »
Jeet Mohapatra · Ching-Yun Ko · Tsui-Wei Weng · Pin-Yu Chen · Sijia Liu · Luca Daniel -
2020 Poster: Optimizing Mode Connectivity via Neuron Alignment »
Norman J Tatro · Pin-Yu Chen · Payel Das · Igor Melnyk · Prasanna Sattigeri · Rongjie Lai -
2020 Spotlight: Higher-Order Certification For Randomized Smoothing »
Jeet Mohapatra · Ching-Yun Ko · Tsui-Wei Weng · Pin-Yu Chen · Sijia Liu · Luca Daniel -
2020 Poster: Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis »
Gang Wang · Songtao Lu · Georgios Giannakis · Gerald Tesauro · Jian Sun -
2019 Poster: Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks »
Xiao Sun · Jungwook Choi · Chia-Yu Chen · Naigang Wang · Swagath Venkataramani · Vijayalakshmi (Viji) Srinivasan · Xiaodong Cui · Wei Zhang · Kailash Gopalakrishnan -
2018 Poster: Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization »
Sijia Liu · Bhavya Kailkhura · Pin-Yu Chen · Paishun Ting · Shiyu Chang · Lisa Amini -
2018 Poster: Efficient Neural Network Robustness Certification with General Activation Functions »
Huan Zhang · Tsui-Wei Weng · Pin-Yu Chen · Cho-Jui Hsieh · Luca Daniel -
2018 Poster: Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives »
Amit Dhurandhar · Pin-Yu Chen · Ronny Luss · Chun-Chen Tu · Paishun Ting · Karthikeyan Shanmugam · Payel Das -
2018 Poster: Training Deep Neural Networks with 8-bit Floating Point Numbers »
Naigang Wang · Jungwook Choi · Daniel Brand · Chia-Yu Chen · Kailash Gopalakrishnan -
2018 Poster: Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks »
Xiaodong Cui · Wei Zhang · Zoltán Tüske · Michael Picheny -
2017 Poster: Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Ce Zhang · Huan Zhang · Cho-Jui Hsieh · Wei Zhang · Ji Liu -
2017 Oral: Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Ce Zhang · Huan Zhang · Cho-Jui Hsieh · Wei Zhang · Ji Liu -
2017 Poster: Dilated Recurrent Neural Networks »
Shiyu Chang · Yang Zhang · Wei Han · Mo Yu · Xiaoxiao Guo · Wei Tan · Xiaodong Cui · Michael Witbrock · Mark Hasegawa-Johnson · Thomas Huang