Timezone: »
The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. Inthis work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a “rich get richer” dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained in this manner exhibit similar performance, but have a distributionwith markedly higher density at zero, allowing more parameters to be pruned safely. Powerpropagation is general, intuitive, cheap and straight-forward to implement and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent line of work, we investigate its effect on sparse training for resource-constrained settings. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark. Secondly, we advocate the useof sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all cases our reparameterisation considerably increases the efficacy of the off-the-shelf methods.
Author Information
Jonathan Richard Schwarz (DeepMind & Gatsby Unit, UCL)
Siddhant Jayakumar (Google DeepMind)
Razvan Pascanu (Google DeepMind)
Peter E Latham (Gatsby Unit, UCL)
Yee Teh (DeepMind)
More from the Same Authors
-
2021 : LiRo: Benchmark and leaderboard for Romanian language tasks »
Stefan Dumitrescu · Petru Rebeja · Beata Lorincz · Mihaela Gaman · Andrei Avram · Mihai Ilie · Andrei Pruteanu · Adriana Stan · Lorena Rosia · Cristina Iacobescu · Luciana Morogan · George Dima · Gabriel Marchidan · Traian Rebedea · Madalina Chitez · Dani Yogatama · Sebastian Ruder · Radu Tudor Ionescu · Razvan Pascanu · Viorica Patraucean -
2021 : Uncertainty Quantification in End-to-End Implicit Neural Representations for Medical Imaging »
Francisca Vasconcelos · Bobby He · Yee Teh -
2022 : Pre-training via Denoising for Molecular Property Prediction »
Sheheryar Zaidi · Michael Schaarschmidt · James Martens · Hyunjik Kim · Yee Whye Teh · Alvaro Sanchez Gonzalez · Peter Battaglia · Razvan Pascanu · Jonathan Godwin -
2022 : When Does Re-initialization Work? »
Sheheryar Zaidi · Tudor Berariu · Hyunjik Kim · Jorg Bornschein · Claudia Clopath · Yee Whye Teh · Razvan Pascanu -
2023 Poster: The Tunnel Effect: Building Data Representations in Deep Neural Networks »
Wojciech Masarczyk · Mateusz Ostaszewski · Ehsan Imani · Razvan Pascanu · Piotr Miłoś · Tomasz Trzcinski -
2023 Poster: Secure Out-of-Distribution Task Generalization with Energy-Based Models »
Shengzhuang Chen · Long-Kai Huang · Jonathan Richard Schwarz · Yilun Du · Ying Wei -
2023 Poster: Deep Reinforcement Learning with Plasticity Injection »
Evgenii Nikishin · Junhyuk Oh · Georg Ostrovski · Clare Lyle · Razvan Pascanu · Will Dabney · Andre Barreto -
2023 Poster: Learning to Modulate pre-trained Models in RL »
Thomas Schmied · Markus Hofmarcher · Fabian Paischer · Razvan Pascanu · Sepp Hochreiter -
2023 Poster: Learning Large-scale Neural Fields via Context Pruned Meta-Learning »
Jihoon Tack · Subin Kim · Sihyun Yu · Jaeho Lee · Jinwoo Shin · Jonathan Richard Schwarz -
2022 Poster: Disentangling Transfer in Continual Reinforcement Learning »
Maciej Wolczyk · Michał Zając · Razvan Pascanu · Łukasz Kuciński · Piotr Miłoś -
2022 Poster: On the Stability and Scalability of Node Perturbation Learning »
Naoki Hiratani · Yash Mehta · Timothy Lillicrap · Peter E Latham -
2021 Workshop: 5th Workshop on Meta-Learning »
Erin Grant · Fábio Ferreira · Frank Hutter · Jonathan Richard Schwarz · Joaquin Vanschoren · Huaxiu Yao -
2021 Poster: On Contrastive Representations of Stochastic Processes »
Emile Mathieu · Adam Foster · Yee Teh -
2021 Poster: Group Equivariant Subsampling »
Jin Xu · Hyunjik Kim · Thomas Rainforth · Yee Teh -
2021 Poster: Continual World: A Robotic Benchmark For Continual Reinforcement Learning »
Maciej Wołczyk · Michał Zając · Razvan Pascanu · Łukasz Kuciński · Piotr Miłoś -
2021 Poster: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations »
Tim G. J. Rudner · Cong Lu · Michael A Osborne · Yarin Gal · Yee Teh -
2021 Poster: Towards Biologically Plausible Convolutional Networks »
Roman Pogodin · Yash Mehta · Timothy Lillicrap · Peter E Latham -
2021 Poster: On the Role of Optimization in Double Descent: A Least Squares Study »
Ilja Kuzborskij · Csaba Szepesvari · Omar Rivasplata · Amal Rannen-Triki · Razvan Pascanu -
2021 Poster: Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels »
Michael Hutchinson · Alexander Terenin · Slava Borovitskiy · So Takao · Yee Teh · Marc Deisenroth -
2021 Poster: BayesIMP: Uncertainty Quantification for Causal Data Fusion »
Siu Lun Chau · Jean-Francois Ton · Javier González · Yee Teh · Dino Sejdinovic -
2021 Poster: Neural Ensemble Search for Uncertainty Estimation and Dataset Shift »
Sheheryar Zaidi · Arber Zela · Thomas Elsken · Chris C Holmes · Frank Hutter · Yee Teh -
2020 : Introduction for invited speaker, Tim Hospedales »
Jonathan Richard Schwarz -
2020 Workshop: Meta-Learning »
Jane Wang · Joaquin Vanschoren · Erin Grant · Jonathan Richard Schwarz · Francesco Visin · Jeff Clune · Roberto Calandra -
2020 Poster: Top-KAST: Top-K Always Sparse Training »
Siddhant Jayakumar · Razvan Pascanu · Jack Rae · Simon Osindero · Erich Elsen -
2020 Poster: Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks »
Roman Pogodin · Peter E Latham -
2020 Poster: Pointer Graph Networks »
Petar Veličković · Lars Buesing · Matthew Overlan · Razvan Pascanu · Oriol Vinyals · Charles Blundell -
2020 Spotlight: Pointer Graph Networks »
Petar Veličković · Lars Buesing · Matthew Overlan · Razvan Pascanu · Oriol Vinyals · Charles Blundell -
2020 Poster: Understanding the Role of Training Regimes in Continual Learning »
Seyed Iman Mirzadeh · Mehrdad Farajtabar · Razvan Pascanu · Hassan Ghasemzadeh -
2019 Poster: Continual Unsupervised Representation Learning »
Dushyant Rao · Francesco Visin · Andrei A Rusu · Razvan Pascanu · Yee Whye Teh · Raia Hadsell -
2019 Poster: Experience Replay for Continual Learning »
David Rolnick · Arun Ahuja · Jonathan Richard Schwarz · Timothy Lillicrap · Gregory Wayne -
2018 : Lunch & Posters »
Haytham Fayek · German Parisi · Brian Xu · Pramod Kaushik Mudrakarta · Sophie Cerf · Sarah Wassermann · Davit Soselia · Rahaf Aljundi · Mohamed Elhoseiny · Frantzeska Lavda · Kevin J Liang · Arslan Chaudhry · Sanmit Narvekar · Vincenzo Lomonaco · Wesley Chung · Michael Chang · Ying Zhao · Zsolt Kira · Pouya Bashivan · Banafsheh Rafiee · Oleksiy Ostapenko · Andrew Jones · Christos Kaplanis · Sinan Kalkan · Dan Teng · Xu He · Vincent Liu · Somjit Nath · Sungsoo Ahn · Ting Chen · Shenyang Huang · Yash Chandak · Nathan Sprague · Martin Schrimpf · Tony Kendall · Jonathan Richard Schwarz · Michael Li · Yunshu Du · Yen-Chang Hsu · Samira Abnar · Bo Wang -
2018 : Introduction of the workshop »
Razvan Pascanu · Yee Teh · Mark Ring · Marc Pickett -
2018 Workshop: Continual Learning »
Razvan Pascanu · Yee Teh · Marc Pickett · Mark Ring -
2018 Poster: Relational recurrent neural networks »
Adam Santoro · Ryan Faulkner · David Raposo · Jack Rae · Mike Chrzanowski · Theophane Weber · Daan Wierstra · Oriol Vinyals · Razvan Pascanu · Timothy Lillicrap -
2017 Poster: Distral: Robust multitask reinforcement learning »
Yee Teh · Victor Bapst · Wojciech Czarnecki · John Quan · James Kirkpatrick · Raia Hadsell · Nicolas Heess · Razvan Pascanu -
2017 Poster: A simple neural network module for relational reasoning »
Adam Santoro · David Raposo · David Barrett · Mateusz Malinowski · Razvan Pascanu · Peter Battaglia · Timothy Lillicrap -
2017 Poster: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2017 Spotlight: A simple neural network module for relational reasoning »
Adam Santoro · David Raposo · David Barrett · Mateusz Malinowski · Razvan Pascanu · Peter Battaglia · Timothy Lillicrap -
2017 Oral: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2017 Poster: Visual Interaction Networks: Learning a Physics Simulator from Video »
Nicholas Watters · Daniel Zoran · Theophane Weber · Peter Battaglia · Razvan Pascanu · Andrea Tacchetti -
2017 Poster: Filtering Variational Objectives »
Chris Maddison · John Lawson · George Tucker · Nicolas Heess · Mohammad Norouzi · Andriy Mnih · Arnaud Doucet · Yee Teh -
2017 Poster: Sobolev Training for Neural Networks »
Wojciech Czarnecki · Simon Osindero · Max Jaderberg · Grzegorz Swirszcz · Razvan Pascanu -
2016 Workshop: Continual Learning and Deep Networks »
Razvan Pascanu · Mark Ring · Tom Schaul -
2016 Poster: Interaction Networks for Learning about Objects, Relations and Physics »
Peter Battaglia · Razvan Pascanu · Matthew Lai · Danilo Jimenez Rezende · koray kavukcuoglu -
2015 Poster: Natural Neural Networks »
Guillaume Desjardins · Karen Simonyan · Razvan Pascanu · koray kavukcuoglu -
2014 Poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio -
2014 Poster: On the Number of Linear Regions of Deep Neural Networks »
Guido F Montufar · Razvan Pascanu · Kyunghyun Cho · Yoshua Bengio -
2013 Poster: Demixing odors - fast inference in olfaction »
Agnieszka Grabska-Barwinska · Jeff Beck · Alexandre Pouget · Peter E Latham -
2013 Spotlight: Demixing odors - fast inference in olfaction »
Agnieszka Grabska-Barwinska · Jeff Beck · Alexandre Pouget · Peter E Latham -
2011 Poster: How biased are maximum entropy models? »
Jakob H Macke · Iain Murray · Peter E Latham -
2007 Oral: Neural characterization in partially observed populations of spiking neurons »
Jonathan W Pillow · Peter E Latham -
2007 Poster: Neural characterization in partially observed populations of spiking neurons »
Jonathan W Pillow · Peter E Latham