Timezone: »
Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs). We show that when trained with proper regularization, LAEs can directly learn the optimal representation -- ordered, axis-aligned principal components. We analyze two such regularization schemes: non-uniform L2 regularization and a deterministic variant of nested dropout [Rippel et al, ICML' 2014]. Though both regularization schemes converge to the optimal representation, we show that this convergence is slow due to ill-conditioning that worsens with increasing latent dimension. We show that the inefficiency of learning the optimal representation is not inevitable -- we present a simple modification to the gradient descent update that greatly speeds up convergence empirically.
Author Information
Xuchan Bao (University of Toronto)
James Lucas (University of Toronto)
Sushant Sachdeva (University of Toronto)
Roger Grosse (University of Toronto)
More from the Same Authors
-
2022 Poster: Optimizing Data Collection for Machine Learning »
Rafid Mahmood · James Lucas · Jose M. Alvarez · Sanja Fidler · Marc Law -
2022 Poster: Amortized Proximal Optimization »
Juhan Bae · Paul Vicol · Jeff Z. HaoChen · Roger Grosse -
2022 Poster: Proximal Learning With Opponent-Learning Awareness »
Stephen Zhao · Chris Lu · Roger Grosse · Jakob Foerster -
2022 Poster: If Influence Functions are the Answer, Then What is the Question? »
Juhan Bae · Nathan Ng · Alston Lo · Marzyeh Ghassemi · Roger Grosse -
2022 Poster: Path Independent Equilibrium Models Can Better Exploit Test-Time Computation »
Cem Anil · Ashwini Pokle · Kaiqu Liang · Johannes Treutlein · Yuhuai Wu · Shaojie Bai · J. Zico Kolter · Roger Grosse -
2021 Poster: Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization »
Deeksha Adil · Brian Bullins · Sushant Sachdeva -
2021 Poster: Learning to Elect »
Cem Anil · Xuchan Bao -
2021 Poster: Differentiable Annealed Importance Sampling and the Perils of Gradient Noise »
Guodong Zhang · Kyle Hsu · Jianing Li · Chelsea Finn · Roger Grosse -
2020 : Invited Talk: Roger Grosse - Why Isn’t Everyone Using Second-Order Optimization? »
Roger Grosse -
2020 : Poster Session 3 (gather.town) »
Denny Wu · Chengrun Yang · Tolga Ergen · sanae lotfi · Charles Guille-Escuret · Boris Ginsburg · Hanbake Lyu · Cong Xie · David Newton · Debraj Basu · Yewen Wang · James Lucas · MAOJIA LI · Lijun Ding · Jose Javier Gonzalez Ortiz · Reyhane Askari Hemmat · Zhiqi Bu · Neal Lawton · Kiran Thekumparampil · Jiaming Liang · Lindon Roberts · Jingyi Zhu · Dongruo Zhou -
2020 Poster: Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians »
Juhan Bae · Roger Grosse -
2019 : James Lucas, "Information-theoretic limitations on novel task generalization" »
James Lucas -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 Poster: Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks »
Guodong Zhang · James Martens · Roger Grosse -
2019 Poster: Lookahead Optimizer: k steps forward, 1 step back »
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton -
2019 Poster: Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model »
Guodong Zhang · Lala Li · Zachary Nado · James Martens · Sushant Sachdeva · George Dahl · Chris Shallue · Roger Grosse -
2019 Poster: Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks »
Qiyang Li · Saminul Haque · Cem Anil · James Lucas · Roger Grosse · Joern-Henrik Jacobsen -
2019 Poster: Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse »
James Lucas · George Tucker · Roger Grosse · Mohammad Norouzi -
2018 : TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer »
Sicong (Sheldon) Huang · Cem Anil · Xuchan Bao -
2018 Poster: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Oral: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse -
2017 Poster: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Spotlight: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Poster: The Reversible Residual Network: Backpropagation Without Storing Activations »
Aidan Gomez · Mengye Ren · Raquel Urtasun · Roger Grosse -
2016 Symposium: Deep Learning Symposium »
Yoshua Bengio · Yann LeCun · Navdeep Jaitly · Roger Grosse -
2016 Poster: Measuring the reliability of MCMC inference with bidirectional Monte Carlo »
Roger Grosse · Siddharth Ancha · Daniel Roy -
2015 Poster: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2015 Spotlight: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2013 Poster: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2013 Oral: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov