Timezone: »
Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always guarantee convergence, and it is not clear whether they can be improved. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes for gradient updates which provides convergence guarantees. QSGD allows the user to smoothly trade off \emph{communication bandwidth} and \emph{convergence time}: nodes can adjust the number of bits sent per iteration, at the cost of possibly higher variance. We show that this trade-off is inherent, in the sense that improving it past some threshold would violate information-theoretic lower bounds. QSGD guarantees convergence for convex and non-convex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques. When applied to training deep neural networks for image classification and automated speech recognition, QSGD leads to significant reductions in end-to-end training time. For example, on 16GPUs, we can train the ResNet152 network to full accuracy on ImageNet 1.8x faster than the full-precision variant.
Author Information
Dan Alistarh (IST Austria)
Demjan Grubic (ETH Zurich / Google)
Jerry Li (Berkeley)
Ryota Tomioka (Microsoft Research Cambridge)
Milan Vojnovic (London School of Economics (LSE))
Milan Vojnovic is Professor, Chair in Data Science, with the Department of Statistics, at London School of Economics and Political Science (LSE), where he is also director of MSc in Data Science program. Prior to this, he worked for 13 years in a corporate research laboratory environment, from 2004 until 2016, as a researcher with Microsoft Research, Cambridge, United Kingdom. He received his Ph.D. degree in Technical Sciences from Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland, in 2003, and both M.Sc. and B.Sc. degrees in Electrical Engineering from the University of Split, Croatia, in 1995 and 1998, respectively. He undertook an internship with Mathematical Research Center, Bell Laboratories, Murray Hill, New Jersey, in 2001. From 2005 to 2014, he was a visiting professor with the University of Split, Croatia, and from 2014 to 2016, he was an affiliated lecturer with the Statistical Laboratory, at the University of Cambridge. His research interests are in data science, machine learning, artificial intelligence, game theory, multi-agent systems, and information networks, with applications in the broad area of information systems and networks. He has made contributions to the theory and the design of computation platforms for processing large-scale data, and to the performance evaluation of computer systems and networks, in particular, in the areas of incentives and online services, distributed computing, network resource allocation, transport control protocols, and peer-to-peer networks. He received several prizes for his work. In 2010, he was awarded the ACM Sigmetrics Rising Star Researcher award, and, in 2005, the ERCIM Cor Baayen Award. He received the IEEE IWQoS 2007 Best Student Paper Award (with Shao Liu and Dinan Gunawardena), the IEEE Infocom 2005 Best Paper Award (with Jean-Yves Le Boudec), the ACM Sigmetrics 2005 Best Paper Award (with Laurent Massoulie), and the ITC 2001 Best Student Paper Award (with Jean-Yves Le Boudec). He delivered numerous lectures and seminars in both academia and industry. He taught several editions of a computer networking course within the undergraduate computer science program at the University of Split. He taught two editions of a course on contest theory within Part III of Mathematical Tripos (master program in mathematics) at the University of Cambridge. He authored the book “Contest Theory: Incentive Mechanisms and Ranking Methods,” Cambridge University Press, 2016.
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding »
Wed. Dec 6th 02:30 -- 06:30 AM Room Pacific Ballroom #21
More from the Same Authors
-
2021 : SSSE: Efficiently Erasing Samples from Trained Machine Learning Models »
Alexandra Peste · Dan Alistarh · Christoph Lampert -
2023 Poster: Timewarp: Transferable Acceleration of Molecular Dynamics by Learning Time-Coarsened Dynamics »
Leon Klein · Andrew Foong · Tor Fjelde · Bruno Mlodozeniec · Marc Brockschmidt · Sebastian Nowozin · Frank Noe · Ryota Tomioka -
2022 Poster: Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning »
Elias Frantar · Dan Alistarh -
2021 Poster: M-FAC: Efficient Matrix-Free Approximations of Second-Order Information »
Elias Frantar · Eldar Kurtic · Dan Alistarh -
2021 Poster: An Information-theoretic Approach to Distribution Shifts »
Marco Federici · Ryota Tomioka · Patrick Forré -
2021 Poster: Distributed Principal Component Analysis with Limited Communication »
Foivos Alimisis · Peter Davies · Bart Vandereycken · Dan Alistarh -
2021 Poster: Towards Tight Communication Lower Bounds for Distributed Optimisation »
Janne H. Korhonen · Dan Alistarh -
2021 Poster: Asynchronous Decentralized SGD with Quantized and Local Updates »
Giorgi Nadiradze · Amirmojtaba Sabour · Peter Davies · Shigang Li · Dan Alistarh -
2021 Poster: Scheduling jobs with stochastic holding costs »
Dabeen Lee · Milan Vojnovic -
2021 Poster: AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks »
Alexandra Peste · Eugenia Iofinova · Adrian Vladu · Dan Alistarh -
2020 Poster: On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them »
Chen Liu · Mathieu Salzmann · Tao Lin · Ryota Tomioka · Sabine Süsstrunk -
2019 Poster: Powerset Convolutional Neural Networks »
Chris Wendler · Markus Püschel · Dan Alistarh -
2019 Poster: Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders »
Emile Mathieu · Charline Le Lan · Chris Maddison · Ryota Tomioka · Yee Whye Teh -
2018 Poster: KONG: Kernels for ordered-neighborhood graphs »
Moez Draief · Konstantin Kutzkov · Kevin Scaman · Milan Vojnovic -
2018 Poster: Byzantine Stochastic Gradient Descent »
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li -
2018 Spotlight: KONG: Kernels for ordered-neighborhood graphs »
Moez Draief · Konstantin Kutzkov · Kevin Scaman · Milan Vojnovic -
2018 Poster: Spectral Signatures in Backdoor Attacks »
Brandon Tran · Jerry Li · Aleksander Madry -
2017 Poster: Communication-Efficient Distributed Learning of Discrete Distributions »
Ilias Diakonikolas · Elena Grigorescu · Jerry Li · Abhiram Natarajan · Krzysztof Onak · Ludwig Schmidt -
2017 Oral: Communication-Efficient Distributed Learning of Discrete Distributions »
Ilias Diakonikolas · Elena Grigorescu · Jerry Li · Abhiram Natarajan · Krzysztof Onak · Ludwig Schmidt -
2016 Poster: f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization »
Sebastian Nowozin · Botond Cseke · Ryota Tomioka -
2015 Poster: Interpolating Convex and Non-Convex Tensor Decompositions via the Subspace Norm »
Qinqing Zheng · Ryota Tomioka -
2014 Poster: Multitask learning meets tensor factorization: task imputation via convex optimization »
Kishan Wimalawarne · Masashi Sugiyama · Ryota Tomioka -
2013 Poster: Convex Tensor Decomposition via Structured Schatten Norm Regularization »
Ryota Tomioka · Taiji Suzuki -
2012 Poster: Perfect Dimensionality Recovery by Variational Bayesian PCA »
Shinichi Nakajima · Ryota Tomioka · Masashi Sugiyama · S. Derin Babacan -
2011 Poster: Statistical Performance of Convex Tensor Decomposition »
Ryota Tomioka · Taiji Suzuki · Kohei Hayashi · Hisashi Kashima -
2010 Spotlight: Global Analytic Solution for Variational Bayesian Matrix Factorization »
Shinichi Nakajima · Masashi Sugiyama · Ryota Tomioka -
2010 Poster: Global Analytic Solution for Variational Bayesian Matrix Factorization »
Shinichi Nakajima · Masashi Sugiyama · Ryota Tomioka -
2007 Spotlight: Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing »
Benjamin Blankertz · Motoaki Kawanabe · Ryota Tomioka · Friederike Hohlefeld · Vadim Nikulin · Klaus-Robert Müller -
2007 Poster: Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing »
Benjamin Blankertz · Motoaki Kawanabe · Ryota Tomioka · Friederike Hohlefeld · Vadim Nikulin · Klaus-Robert Müller -
2006 Poster: Logistic Regression for Single Trial EEG Classification »
Ryota Tomioka · Kazuyuki Aihara · Klaus-Robert Müller -
2006 Spotlight: Logistic Regression for Single Trial EEG Classification »
Ryota Tomioka · Kazuyuki Aihara · Klaus-Robert Müller