Timezone: »

Memory-Efficient Backpropagation Through Time
Audrunas Gruslys · Remi Munos · Ivo Danihelka · Marc Lanctot · Alex Graves

Tue Dec 06 09:00 AM -- 12:30 PM (PST) @ Area 5+6+7+8 #64 #None

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.

Author Information

Audrunas Gruslys (Google DeepMind)
Remi Munos (Google DeepMind)
Ivo Danihelka (DeepMind)
Marc Lanctot (Google DeepMind)
Alex Graves (Google DeepMind)

Main contributions to neural networks include the Connectionist Temporal Classification training algorithm (widely used for speech, handwriting and gesture recognition, e.g. by Google voice search), a type of differentiable attention for RNNs (originally for handwriting generation, now a standard tool in computer vision, machine translation and elsewhere), stochastic gradient variational inference, and Neural Turing Machines. He works at Google Deep Mind.

More from the Same Authors