Timezone: »
Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error. The surprising result is that from both the Bayesian and frequentist perspectives this can yield the natural gradient direction. Although that direction can be very expensive to compute we develop an efficient, general, online approximation to the natural gradient descent which is suited to large scale problems. We report experimental results showing much faster convergence in computation time and in number of iterations with TONGA (Topmoumoute Online natural Gradient Algorithm) than with stochastic gradient descent, even on very large datasets.
Author Information
Nicolas Le Roux (Microsoft Research)
Pierre-Antoine Manzagol (Google)
Yoshua Bengio (Mila / U. Montreal)
Yoshua Bengio (PhD'1991 in Computer Science, McGill University). After two post-doctoral years, one at MIT with Michael Jordan and one at AT&T Bell Laboratories with Yann LeCun, he became professor at the department of computer science and operations research at Université de Montréal. Author of two books (a third is in preparation) and more than 200 publications, he is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since '2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since '2006 an NSERC Chair, since '2005 his is a Senior Fellow of the Canadian Institute for Advanced Research and since 2014 he co-directs its program focused on deep learning. He is on the board of the NIPS foundation and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the International Conference on Learning Representations. His interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning, representation learning, the geometry of generalization in high-dimensional spaces, manifold learning and biologically inspired learning algorithms.
More from the Same Authors
-
2021 Spotlight: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2021 : Long-Term Credit Assignment via Model-based Temporal Shortcuts »
Michel Ma · Pierluca D'Oro · Yoshua Bengio · Pierre-Luc Bacon -
2021 : A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2021 : Effect of diversity in Meta-Learning »
Ramnath Kumar · Tristan Deleu · Yoshua Bengio -
2022 : Poly-S: Analyzing and Improving Polytropon for Data-Efficient Multi-Task Learning »
Lucas Page-Caccia · Edoardo Maria Ponti · Liyuan Liu · Matheus Pereira · Nicolas Le Roux · Alessandro Sordoni -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2021 Poster: Dynamic Inference with Neural Interpreters »
Nasim Rahaman · Muhammad Waleed Gondal · Shruti Joshi · Peter Gehler · Yoshua Bengio · Francesco Locatello · Bernhard Schölkopf -
2021 Poster: Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 Poster: A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2021 Poster: Neural Production Systems »
Anirudh Goyal · Aniket Didolkar · Nan Rosemary Ke · Charles Blundell · Philippe Beaudoin · Nicolas Heess · Michael Mozer · Yoshua Bengio -
2021 Poster: Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation »
Emmanuel Bengio · Moksh Jain · Maksym Korablyov · Doina Precup · Yoshua Bengio -
2021 Poster: The Causal-Neural Connection: Expressiveness, Learnability, and Inference »
Kevin Xia · Kai-Zhan Lee · Yoshua Bengio · Elias Bareinboim -
2021 Poster: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2021 Poster: Discrete-Valued Neural Communication »
Dianbo Liu · Alex Lamb · Kenji Kawaguchi · Anirudh Goyal · Chen Sun · Michael Mozer · Yoshua Bengio -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Sham M Kakade · Martha White · Nicolas Le Roux -
2020 Poster: An operator view of policy gradient methods »
Dibya Ghosh · Marlos C. Machado · Nicolas Le Roux -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2019 Spotlight: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2018 : Poster Session 1 (note there are numerous missing names here, all papers appear in all poster sessions) »
Akhilesh Gotmare · Kenneth Holstein · Jan Brabec · Michal Uricar · Kaleigh Clary · Cynthia Rudin · Sam Witty · Andrew Ross · Shayne O'Brien · Babak Esmaeili · Jessica Forde · Massimo Caccia · Ali Emami · Scott Jordan · Bronwyn Woods · D. Sculley · Rebekah Overdorf · Nicolas Le Roux · Peter Henderson · Brandon Yang · Tzu-Yu Liu · David Jensen · Niccolo Dalmasso · Weitang Liu · Paul Marc TRICHELAIR · Jun Ki Lee · Akanksha Atrey · Matt Groh · Yotam Hechtlinger · Emma Tosch -
2017 : From deep learning of disentangled representations to higher-level cognition »
Yoshua Bengio -
2015 Tutorial: Deep Learning »
Geoffrey E Hinton · Yoshua Bengio · Yann LeCun -
2014 Workshop: Second Workshop on Transfer and Multi-Task Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernández-lobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant -
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · Chen-Yu Lee · Rich M Schwartz -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio -
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean Pouget-Abadie · Mehdi Mirza · Bing Xu · David Warde-Farley · Sherjil Ozair · Aaron Courville · Yoshua Bengio -
2014 Poster: On the Number of Linear Regions of Deep Neural Networks »
Guido F Montufar · Razvan Pascanu · Kyunghyun Cho · Yoshua Bengio -
2014 Demonstration: Neural Machine Translation »
Bart van Merriënboer · Kyunghyun Cho · Dzmitry Bahdanau · Yoshua Bengio -
2014 Oral: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Poster: Iterative Neural Autoregressive Distribution Estimator NADE-k »
Tapani Raiko · Yao Li · Kyunghyun Cho · Yoshua Bengio -
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2013 Poster: Multi-Prediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio -
2013 Poster: Generalized Denoising Auto-Encoders as Generative Models »
Yoshua Bengio · Li Yao · Guillaume Alain · Pascal Vincent -
2013 Poster: Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs »
Yann Dauphin · Yoshua Bengio -
2012 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · James Bergstra · Quoc V. Le -
2012 Poster: A latent factor model for highly multi-relational data »
Rodolphe Jenatton · Nicolas Le Roux · Antoine Bordes · Guillaume R Obozinski -
2012 Poster: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2012 Oral: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu -
2011 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · Adam Coates · Yann LeCun · Nicolas Le Roux · Andrew Y Ng -
2011 Oral: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller -
2011 Poster: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Poster: Shallow vs. Deep Sum-Product Networks »
Olivier Delalleau · Yoshua Bengio -
2011 Poster: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller -
2011 Oral: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Poster: Algorithms for Hyper-Parameter Optimization »
James Bergstra · Rémi Bardenet · Yoshua Bengio · Balázs Kégl -
2011 Poster: On Tracking The Partition Function »
Guillaume Desjardins · Aaron Courville · Yoshua Bengio -
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng -
2009 Poster: Slow, Decorrelated Features for Pretraining Complex Cell-like Networks »
James Bergstra · Yoshua Bengio -
2009 Poster: An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism »
Aaron Courville · Douglas Eck · Yoshua Bengio -
2009 Session: Debate on Future Publication Models for the NIPS Community »
Yoshua Bengio -
2007 Poster: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio -
2007 Poster: Learning the 2-D Topology of Images »
Nicolas Le Roux · Yoshua Bengio · Pascal Lamblin · Marc Joliveau · Balázs Kégl -
2007 Spotlight: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio -
2006 Poster: Greedy Layer-Wise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle -
2006 Talk: Greedy Layer-Wise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle