Geoffrey E Hinton · Yoshua Bengio · Yann LeCun

[ Level 2 room 210 AB ]

Deep Learning allows computational models composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection, and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large datasets by using the back-propagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about dramatic improvements in processing images, video, speech and audio, while recurrent nets have shone on sequential data such as text and speech. Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep learning methods are representation learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. This tutorial will introduce the fundamentals of deep learning, discuss applications, and close with challenges ahead.

Jeff Dean · Oriol Vinyals

[ Level 2 room 210 E,F ]

Over the past few years, we have built large-scale computer systems for training neural networks, and then applied these systems to a wide variety of problems that have traditionally been very difficult for computers. We have made significant improvements in the state-of-the-art in many of these areas, and our software systems and algorithms have been used by dozens of different groups at Google to train state-of-the-art models for speech recognition, image recognition, various visual detection tasks, language modeling, language translation, and many other tasks. In this talk,we'll highlight some of the distributed systems and algorithms that we use in order to train large models quickly, and demonstrate TensorFlow (tensorflow.org), an open-source software system we have put together that makes it easy to conduct research in large-scale machine learning.

Iain Murray

[ Level 2 room 210 AB ]

"Monte Carlo" methods use random sampling to understand a system, estimate averages, or compute integrals. Monte Carlo methods were amongst the earliest applications run on electronic computers in the 1940s, and continue to see widespread use and research as our models and computational power grow. In the NIPS community, random sampling is widely used within optimization methods, and as a way to perform inference in probabilistic models. Here "inference" simply means obtaining multiple plausible settings of model parameters that could have led to the observed data. Obtaining a range of explanations tells us both what we can and cannot know from our data, and prevents us from making overconfident (wrong) predictions.

This introductory-level tutorial will describe some of the fundamental Monte Carlo algorithms, and examples of how they can be combined with models in different ways. We'll see that Monte Carlo methods are sometimes a quick and easy way to perform inference in a new model, but also what can go wrong, and some treatment of how to debug these randomized algorithms.

Frank Wood

[ Level 2 room 210 E,F ]

Probabilistic programming is a general-purpose means of expressing and automatically performing model-based inference. A key characteristic of many probabilistic programming systems is that models can be compactly expressed in terms of executable generative procedures, rather than in declarative mathematical notation. For this reason, along with automated or programmable inference, probabilistic programming has the potential to increase the number of people who can build and understand their own models. It also could make the development and testing of new general-purpose inference algorithms more efficient, and could accelerate the exploration and development of new models for application-specific use.

The primary goals of this tutorial will be to introduce probabilistic programming both as a general concept and in terms of how current systems work, to examine the historical academic context in which probabilistic programming arose, and to expose some challenges unique to probabilistic programming.

Richard Sutton

[ Level 2 room 210 AB ]

Reinforcement learning is a body of theory and techniques for optimal sequential decision making developed in the last thirty years primarily within the machine learning and operations research communities, and which has separately become important in psychology and neuroscience. This tutorial will develop an intuitive understanding of the underlying formal problem (Markov decision processes) and its core solution methods, including dynamic programming, Monte Carlo methods, and temporal-difference learning. It will focus on how these methods have been combined with parametric function approximation, including deep learning, to find good approximate solutions to problems that are otherwise too large to be addressed at all. Finally, it will briefly survey some recent developments in function approximation, eligibility traces, and off-policy learning.

Bill Dally

[ Level 2 room 210 E,F ]

This tutorial will survey the state of the art in high-performance hardware for machine learning with an emphasis on hardware for training and deployment of deep neural networks (DNNs). We establish a baseline by characterizing the performance and efficiency (perf/W) of DNNs implemented on conventional CPUs. GPU implementations of DNNs make substantial improvements over this baseline. GPU implementations perform best with moderate batch sizes. We examine the sensitivity of performance to batch size. Training of DNNs can be accelerated further using both model and data parallelism, at the cost of inter-processor communication. We examine common parallel formulations and the communication traffic they induce. Training and deployment can also be accelerated by using reduced precision for weights and activations. We will examine the tradeoff between accuracy and precision in these networks. We close with a discussion of dedicated hardware for machine learning. We survey recent publications on this topic and make some general observations about the relative importance of arithmetic and memory bandwidth in such dedicated hardware.