Workshop

Deep Learning at Supercomputer Scale

Erich Elsen ⋅ Danijar Hafner ⋅ Zak Stone ⋅ Brennan Saeta

Project Page

Abstract

Five years ago, it took more than a month to train a state-of-the-art image recognition model on the ImageNet dataset. Earlier this year, Facebook demonstrated that such a model could be trained in an hour. However, if we could parallelize this training problem across the world’s fastest supercomputers (~100 PFlops), it would be possible to train the same model in under a minute. This workshop is about closing that gap: how can we turn months into minutes and increase the productivity of machine learning researchers everywhere?

This one-day workshop will facilitate active debate and interaction across many different disciplines. The conversation will range from algorithms to infrastructure to silicon, with invited speakers from Cerebras, DeepMind, Facebook, Google, OpenAI, and other organizations. When should synchronous training be preferred over asynchronous training? Are large batch sizes the key to reach supercomputer scale, or is it possible to fully utilize a supercomputer at batch size one? How important is sparsity in enabling us to scale? Should sparsity patterns be structured or unstructured? To what extent do we expect to customize model architectures for particular problem domains, and to what extent can a “single model architecture” deliver state-of-the-art results across many different domains? How can new hardware architectures unlock even higher real-world training performance?

Our goal is bring people who are trying to answer any of these questions together in hopes that cross pollination will accelerate progress towards deep learning at true supercomputer scale.

Chat is not available.

Schedule

Timezone: America/Los_Angeles

8:10 AM

Generalization Gap

Nitish Shirish Keskar

8:30 AM

Closing the Generalization Gap

Itay Hubara

8:50 AM

Don't Decay the Learning Rate, Increase the Batch Size

Sam Smith

9:10 AM

ImageNet In 1 Hour

Priya Goyal

9:30 AM

Training with TPUs

Chris Ying

9:50 AM

Coffee Break

10:10 AM

KFAC and Natural Gradients

Matthew Johnson ⋅ Daniel Duckworth

10:30 AM

Neumann Optimizer

Shankar Krishnan

10:50 AM

Evolutionary Strategies

Tim Salimans

11:15 AM

Future Hardware Directions

Gregory Diamos ⋅ Jeff Dean ⋅ Simon Knowles ⋅ Michael James ⋅ Scott Gray

1:30 PM

Learning Device Placement

Azalia Mirhoseini

1:50 PM

Scaling and Sparsity

Gregory Diamos

2:10 PM

Small World Network Architectures

Scott Gray

2:30 PM

Scalable RL and AlphaGo

Timothy Lillicrap

3:20 PM

Scaling Deep Learning to 15 PetaFlops

Thorsten Kurth

3:40 PM

Scalable Silicon Compute

Simon Knowles

4:00 PM

Practical Scaling Techniques

4:20 PM

Designing for Supercompute-Scale Deep Learning

Michael James

5:00 PM

Adaptive Memory Networks

Daniel Li

5:00 PM

Supercomputers for Deep Learning

Sreenivas Sukumar