Timezone: »
Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. The nature of structured models is that communication among the components has a bottleneck, typically achieved by restricted connectivity and attention. In this work, we further tighten the bottleneck via discreteness of the representations transmitted between components. We hypothesize that this constraint serves as a useful form of inductive bias. Our hypothesis is motivated by past empirical work showing the benefits of discretization in non-structured architectures as well as our own theoretical results showing that discretization increases noise robustness and reduces the underlying dimensionality of the model. Building on an existing technique for discretization from the VQ-VAE, we consider multi-headed discretization with shared codebooks as the output of each architectural component. One motivating intuition is human language in which communication occurs through multiple discrete symbols. This form of communication is hypothesized to facilitate transmission of information between functional components of the brain by providing a common interlingua, just as it does for human-to-human communication. Our experiments show that discrete-valued neural communication (DVNC) substantially improves systematic generalization in a variety of architectures—transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method useful in practice.
Author Information
Dianbo Liu (University of Harvard/Boston Children's Hospital/MIT)
Alex Lamb (Universite de Montreal)
Kenji Kawaguchi (MIT)
Anirudh Goyal (Université de Montréal)
Chen Sun (Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal)
Michael Mozer (Google Research / University of Colorado)
Yoshua Bengio (Mila / U. Montreal)
Yoshua Bengio (PhD'1991 in Computer Science, McGill University). After two post-doctoral years, one at MIT with Michael Jordan and one at AT&T Bell Laboratories with Yann LeCun, he became professor at the department of computer science and operations research at Université de Montréal. Author of two books (a third is in preparation) and more than 200 publications, he is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since '2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since '2006 an NSERC Chair, since '2005 his is a Senior Fellow of the Canadian Institute for Advanced Research and since 2014 he co-directs its program focused on deep learning. He is on the board of the NIPS foundation and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the International Conference on Learning Representations. His interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning, representation learning, the geometry of generalization in high-dimensional spaces, manifold learning and biologically inspired learning algorithms.
More from the Same Authors
-
2021 Spotlight: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2021 : Catastrophic Failures of Neural Active Learning on Heteroskedastic Distributions »
Savya Khosla · Alex Lamb · Jordan Ash · Cyril Zhang · Kenji Kawaguchi -
2021 : Long-Term Credit Assignment via Model-based Temporal Shortcuts »
Michel Ma · Pierluca D'Oro · Yoshua Bengio · Pierre-Luc Bacon -
2021 : A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2021 : Effect of diversity in Meta-Learning »
Ramnath Kumar · Tristan Deleu · Yoshua Bengio -
2021 : Noether Networks: Meta-Learning Useful Conserved Quantities »
Ferran Alet · Dylan Doblar · Allan Zhou · Josh Tenenbaum · Kenji Kawaguchi · Chelsea Finn -
2021 : Learning Neural Causal Models with Active Interventions »
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke -
2022 Poster: Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning »
Riashat Islam · Hongyu Zang · Anirudh Goyal · Alex Lamb · Kenji Kawaguchi · Xin Li · Romain Laroche · Yoshua Bengio · Remi Tachet des Combes -
2022 : Neural Network Online Training with Sensitivity to Multiscale Temporal Structure »
Matt Jones · Tyler Scott · Gamaleldin Elsayed · Mengye Ren · Katherine Hermann · David Mayo · Michael Mozer -
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2022 : Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Hongyu Zang · Yonathan Efroni · Dipendra Misra · Aniket Didolkar · Xin Li · Harm Van Seijen · Remi Tachet des Combes · John Langford -
2022 : Towards Data-Driven Offline Simulations for Online Reinforcement Learning »
Shengpu Tang · Felipe Vieira Frujeri · Dipendra Misra · Alex Lamb · John Langford · Paul Mineiro · Sebastian Kochman -
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2022 : An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better? »
Tyler Scott · Ting Liu · Michael Mozer · Andrew Gallagher -
2023 Poster: ConSpec: honing in on critical steps for rapid learning and generalization in RL »
Chen Sun · Wannan Yang · Thomas Jiralerspong · Dane Malenfant · Benjamin Alsbury-Nealy · Yoshua Bengio · Blake Richards -
2022 Poster: SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos »
Gamaleldin Elsayed · Aravindh Mahendran · Sjoerd van Steenkiste · Klaus Greff · Michael Mozer · Thomas Kipf -
2022 Poster: Trajectory balance: Improved credit assignment in GFlowNets »
Nikolay Malkin · Moksh Jain · Emmanuel Bengio · Chen Sun · Yoshua Bengio -
2022 Poster: Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning »
Aniket Didolkar · Kshitij Gupta · Anirudh Goyal · Nitesh Bharadwaj Gundavarapu · Alex Lamb · Nan Rosemary Ke · Yoshua Bengio -
2021 Poster: Adversarial Training Helps Transfer Learning via Better Representations »
Zhun Deng · Linjun Zhang · Kailas Vodrahalli · Kenji Kawaguchi · James Zou -
2021 Poster: Dynamic Inference with Neural Interpreters »
Nasim Rahaman · Muhammad Waleed Gondal · Shruti Joshi · Peter Gehler · Yoshua Bengio · Francesco Locatello · Bernhard Schölkopf -
2021 Poster: Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 Poster: Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss »
Michael Iuzzolino · Michael Mozer · Samy Bengio -
2021 Poster: Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization »
Clement Gehring · Kenji Kawaguchi · Jiaoyang Huang · Leslie Kaelbling -
2021 Poster: Soft Calibration Objectives for Neural Networks »
Archit Karandikar · Nicholas Cain · Dustin Tran · Balaji Lakshminarayanan · Jonathon Shlens · Michael Mozer · Becca Roelofs -
2021 Poster: A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2021 : Real Robot Challenge II + Q&A »
Stefan Bauer · Joel Akpo · Manuel Wuethrich · Nan Rosemary Ke · Anirudh Goyal · Thomas Steinbrenner · Felix Widmaier · Annika Buchholz · Bernhard Schölkopf · Dieter Büchler · Ludovic Righetti · Franziska Meier -
2021 Poster: Neural Production Systems »
Anirudh Goyal · Aniket Didolkar · Nan Rosemary Ke · Charles Blundell · Philippe Beaudoin · Nicolas Heess · Michael Mozer · Yoshua Bengio -
2021 Poster: Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation »
Emmanuel Bengio · Moksh Jain · Maksym Korablyov · Doina Precup · Yoshua Bengio -
2021 Poster: EIGNN: Efficient Infinite-Depth Graph Neural Networks »
Juncheng Liu · Kenji Kawaguchi · Bryan Hooi · Yiwei Wang · Xiaokui Xiao -
2021 Poster: The Causal-Neural Connection: Expressiveness, Learnability, and Inference »
Kevin Xia · Kai-Zhan Lee · Yoshua Bengio · Elias Bareinboim -
2021 Poster: Noether Networks: meta-learning useful conserved quantities »
Ferran Alet · Dylan Doblar · Allan Zhou · Josh Tenenbaum · Kenji Kawaguchi · Chelsea Finn -
2021 Poster: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2021 Poster: Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time »
Ferran Alet · Maria Bauza · Kenji Kawaguchi · Nurullah Giray Kuru · Tomás Lozano-Pérez · Leslie Kaelbling -
2020 Poster: Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples »
Samarth Sinha · Zhengli Zhao · Anirudh Goyal · Colin A Raffel · Augustus Odena -
2020 Poster: Untangling tradeoffs between recurrence and self-attention in artificial neural networks »
Giancarlo Kerg · Bhargav Kanuparthi · Anirudh Goyal · Kyle Goyette · Yoshua Bengio · Guillaume Lajoie -
2018 : Lunch »
Hong Yu · Bhanu Pratap Singh Rawat · Arijit Ukil · Waheeda Saib · Jekaterina Novikova · John Hughes · Yuhui Zhang · Rahul V · Mi Jung Kim · Babak Taati · Hariharan Ravishankar · Harry Clifford · Hirofumi Kobayashi · Babak Taati · Keyang Xu · Yen-Chi Cheng · Timothy Cannings · Jayashree Kalpathy-Cramer · Jayashree Kalpathy-Cramer · Parinaz Sobhani · Kimis Perros · Wei-Hung Weng · Yordan Raykov · Lars Lorch · Mengqi Jin · Xue Teng · Michael Ferlaino · Marek Rei · Cédric Beaulac · Aman Verma · Sebastian Keller · Edmond Cunningham · Luc Evers · Victor Rodriguez · Vipul Satone · Dianbo Liu · Angeline Yasodhara · Geoff Tison · Ligin Solamen · Bryan He · Rahul Ladhania · Yipeng Shi · Md Nafiz Hamid · Pouria Mashouri · Woochan Hwang · Sejin Park · Xu Chen · Rachneet Kaur · Davis Blalock · Holly Wiberg · Parminder Bhatia · Kezi Yu · RUMENG LI · Jun Sakuma · Charles Ding · Aaron Babier · Yong Cai · A Pratap · Luke O'Connor · Allen Nie · Martin Kang · Ian Covert · Xun Wang · Zelun Luo · Serena Yeung · William Boag · Kazuki Tachikawa · Mary Saltz · Owen Lahav · Edward Lee · Eric Teasley · Michael Kamp · Nirmesh Patel · Vishwali Mhasawade · Maxim Samarin · Ryo Uchimido · Farzad Khalvati · Francisco Cruz · Laura Symul · Zaid Nabulsi · Mads Mihailescu · Rosalind Picard -
2018 Poster: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding »
Nan Rosemary Ke · Anirudh Goyal · Olexa Bilaniuk · Jonathan Binas · Michael Mozer · Chris Pal · Yoshua Bengio -
2018 Spotlight: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding »
Nan Rosemary Ke · Anirudh Goyal · Olexa Bilaniuk · Jonathan Binas · Michael Mozer · Chris Pal · Yoshua Bengio -
2017 : From deep learning of disentangled representations to higher-level cognition »
Yoshua Bengio -
2017 Poster: Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net »
Anirudh Goyal · Nan Rosemary Ke · Surya Ganguli · Yoshua Bengio -
2017 Poster: Z-Forcing: Training Stochastic Recurrent Networks »
Anirudh Goyal · Alessandro Sordoni · Marc-Alexandre Côté · Nan Rosemary Ke · Yoshua Bengio -
2016 Poster: Professor Forcing: A New Algorithm for Training Recurrent Networks »
Alex M Lamb · Anirudh Goyal · Ying Zhang · Saizheng Zhang · Aaron Courville · Yoshua Bengio -
2016 Poster: Deep Learning without Poor Local Minima »
Kenji Kawaguchi -
2016 Oral: Deep Learning without Poor Local Minima »
Kenji Kawaguchi -
2015 Poster: Bayesian Optimization with Exponential Convergence »
Kenji Kawaguchi · Leslie Kaelbling · Tomás Lozano-Pérez -
2015 Tutorial: Deep Learning »
Geoffrey E Hinton · Yoshua Bengio · Yann LeCun -
2014 Workshop: Second Workshop on Transfer and Multi-Task Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernández-lobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant -
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · Chen-Yu Lee · Rich M Schwartz -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio -
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean Pouget-Abadie · Mehdi Mirza · Bing Xu · David Warde-Farley · Sherjil Ozair · Aaron Courville · Yoshua Bengio -
2014 Poster: On the Number of Linear Regions of Deep Neural Networks »
Guido F Montufar · Razvan Pascanu · Kyunghyun Cho · Yoshua Bengio -
2014 Demonstration: Neural Machine Translation »
Bart van Merriënboer · Kyunghyun Cho · Dzmitry Bahdanau · Yoshua Bengio -
2014 Oral: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Poster: Iterative Neural Autoregressive Distribution Estimator NADE-k »
Tapani Raiko · Yao Li · Kyunghyun Cho · Yoshua Bengio -
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2013 Poster: Multi-Prediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio -
2013 Poster: Generalized Denoising Auto-Encoders as Generative Models »
Yoshua Bengio · Li Yao · Guillaume Alain · Pascal Vincent -
2013 Poster: Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs »
Yann Dauphin · Yoshua Bengio -
2012 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · James Bergstra · Quoc V. Le -
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu -
2011 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · Adam Coates · Yann LeCun · Nicolas Le Roux · Andrew Y Ng -
2011 Oral: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller -
2011 Poster: Shallow vs. Deep Sum-Product Networks »
Olivier Delalleau · Yoshua Bengio -
2011 Poster: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller -
2011 Poster: Algorithms for Hyper-Parameter Optimization »
James Bergstra · Rémi Bardenet · Yoshua Bengio · Balázs Kégl -
2011 Poster: On Tracking The Partition Function »
Guillaume Desjardins · Aaron Courville · Yoshua Bengio -
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng -
2009 Poster: Slow, Decorrelated Features for Pretraining Complex Cell-like Networks »
James Bergstra · Yoshua Bengio -
2009 Poster: An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism »
Aaron Courville · Douglas Eck · Yoshua Bengio -
2009 Session: Debate on Future Publication Models for the NIPS Community »
Yoshua Bengio -
2007 Poster: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio -
2007 Poster: Learning the 2-D Topology of Images »
Nicolas Le Roux · Yoshua Bengio · Pascal Lamblin · Marc Joliveau · Balázs Kégl -
2007 Spotlight: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio -
2007 Poster: Topmoumoute Online Natural Gradient Algorithm »
Nicolas Le Roux · Pierre-Antoine Manzagol · Yoshua Bengio -
2006 Poster: Greedy Layer-Wise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle -
2006 Talk: Greedy Layer-Wise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle