Large Scale Distributed Deep Networks
Jeff Dean · Greg Corrado · Rajat Monga · Kai Chen · Matthieu Devin · Quoc V Le · Mark Mao · Marc'Aurelio Ranzato · Andrew Senior · Paul Tucker · Ke Yang · Andrew Y Ng

Wed Dec 5th 07:00 PM -- 12:00 AM @ Harrah’s Special Events Center 2nd Floor #None

Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports for a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 100x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

Author Information

Jeff Dean (Google Research)

Jeff joined Google in 1999 and is currently a Google Senior Fellow. He currently leads Google's Research and Health divisions, where he co-founded the Google Brain team. He has co-designed/implemented multiple generations of Google's distributed machine learning systems for neural network training and inference, as well as multiple generations of Google's crawling, indexing, and query serving systems, and major pieces of Google's initial advertising and AdSense for Content systems. He is also a co-designer and co-implementor of Google's distributed computing infrastructure, including the MapReduce, BigTable and Spanner systems, protocol buffers, LevelDB, systems infrastructure for statistical machine translation, and a variety of internal and external libraries and developer tools. He received a Ph.D. in Computer Science from the University of Washington in 1996, working with Craig Chambers on compiler techniques for object-oriented languages. He is a Fellow of the ACM, a Fellow of the AAAS, a member of the U.S. National Academy of Engineering, and a recipient of the Mark Weiser Award and the ACM Prize in Computing.

Greg Corrado (Google Health)
Rajat Monga (Google)
Kai Chen (Google Research)
Matthieu Devin
Quoc V Le (Stanford)
Mark Mao
Marc'Aurelio Ranzato (Facebook AI Research)
Andrew Senior (DeepMind)
Paul Tucker
Ke Yang (Google Inc.)
Andrew Y Ng (Baidu Research)

More from the Same Authors