Timezone: »
Posterior collapse in Variational Autoencoders (VAEs) with uninformative priors arises when the variational posterior distribution closely matches the prior for a subset of latent variables. This paper presents a simple and intuitive explanation for posterior collapse through the analysis of linear VAEs and their direct correspondence with Probabilistic PCA (pPCA). We explain how posterior collapse may occur in pPCA due to local maxima in the log marginal likelihood. Unexpectedly, we prove that the ELBO objective for the linear VAE does not introduce additional spurious local maxima relative to log marginal likelihood. We show further that training a linear VAE with exact variational inference recovers a uniquely identifiable global maximum corresponding to the principal component directions. Empirically, we find that our linear analysis is predictive even for high-capacity, non-linear VAEs and helps explain the relationship between the observation noise, local maxima, and posterior collapse in deep Gaussian VAEs.
Author Information
James Lucas (University of Toronto)
George Tucker (Google Brain)
Roger Grosse (University of Toronto)
Mohammad Norouzi (Google Brain)
More from the Same Authors
-
2021 : Palette: Image-to-Image Diffusion Models »
Chitwan Saharia · William Chan · Huiwen Chang · Chris Lee · Jonathan Ho · Tim Salimans · David Fleet · Mohammad Norouzi -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : Offline Policy Selection under Uncertainty »
Sherry Yang · Bo Dai · Ofir Nachum · George Tucker · Dale Schuurmans -
2021 : Palette: Image-to-Image Diffusion Models »
Chitwan Saharia · William Chan · Huiwen Chang · Chris Lee · Jonathan Ho · Tim Salimans · David Fleet · Mohammad Norouzi -
2022 Poster: Optimizing Data Collection for Machine Learning »
Rafid Mahmood · James Lucas · Jose M. Alvarez · Sanja Fidler · Marc Law -
2022 : Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes »
Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine -
2022 : Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios »
Yiren Lu · Yiren Lu · Yiren Lu · Justin Fu · George Tucker · Xinlei Pan · Eli Bronstein · Rebecca Roelofs · Benjamin Sapp · Brandyn White · Aleksandra Faust · Shimon Whiteson · Dragomir Anguelov · Sergey Levine -
2023 : Scaling Offline Q-Learning with Vision Transformers »
Yingjie Miao · Jordi Orbay · Rishabh Agarwal · Aviral Kumar · George Tucker · Aleksandra Faust -
2023 : Scaling Offline Q-Learning with Vision Transformers »
Yingjie Miao · Jordi Orbay · Rishabh Agarwal · Aviral Kumar · George Tucker · Aleksandra Faust -
2023 Poster: The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation »
Saurabh Saxena · Charles Herrmann · Junhwa Hur · Abhishek Kar · Mohammad Norouzi · Deqing Sun · David Fleet -
2023 Oral: The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation »
Saurabh Saxena · Charles Herrmann · Junhwa Hur · Abhishek Kar · Mohammad Norouzi · Deqing Sun · David Fleet -
2023 Poster: Similarity-based cooperative equilibrium »
Caspar Oesterheld · Johannes Treutlein · Roger Grosse · Vincent Conitzer · Jakob Foerster -
2023 Poster: Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research »
Cole Gulino · Justin Fu · Wenjie Luo · George Tucker · Eli Bronstein · Yiren Lu · Jean Harb · Xinlei Pan · Yan Wang · Xiangyu Chen · John Co-Reyes · Rishabh Agarwal · Rebecca Roelofs · Yao Lu · Nico Montali · Paul Mougin · Zoey Yang · Brandyn White · Aleksandra Faust · Rowan McAllister · Dragomir Anguelov · Benjamin Sapp -
2022 : Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes »
Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine -
2022 : Imagenary Patterns with Diffusion Models »
Mohammad Norouzi -
2022 : Invited Talk: Mohammad Norouzi »
Mohammad Norouzi -
2022 : Interactive Industrial Panel »
Jiahao Sun · Ahmed Ibrahim · Marjan Ghazvininejad · Yu Cheng · Boxing Chen · Mohammad Norouzi · Rahul Gupta -
2022 Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad" »
Aviral Kumar · Rishabh Agarwal · Aravind Rajeswaran · Wenxuan Zhou · George Tucker · Doina Precup · Aviral Kumar -
2022 Poster: Oracle Inequalities for Model Selection in Offline Reinforcement Learning »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai · Emma Brunskill -
2022 Poster: Amortized Proximal Optimization »
Juhan Bae · Paul Vicol · Jeff Z. HaoChen · Roger Grosse -
2022 Poster: Video Diffusion Models »
Jonathan Ho · Tim Salimans · Alexey Gritsenko · William Chan · Mohammad Norouzi · David Fleet -
2022 Poster: Proximal Learning With Opponent-Learning Awareness »
Stephen Zhao · Chris Lu · Roger Grosse · Jakob Foerster -
2022 Poster: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding »
Chitwan Saharia · William Chan · Saurabh Saxena · Lala Li · Jay Whang · Remi Denton · Kamyar Ghasemipour · Raphael Gontijo Lopes · Burcu Karagol Ayan · Tim Salimans · Jonathan Ho · David Fleet · Mohammad Norouzi -
2022 Poster: If Influence Functions are the Answer, Then What is the Question? »
Juhan Bae · Nathan Ng · Alston Lo · Marzyeh Ghassemi · Roger Grosse -
2022 Poster: Path Independent Equilibrium Models Can Better Exploit Test-Time Computation »
Cem Anil · Ashwini Pokle · Kaiqu Liang · Johannes Treutlein · Yuhuai Wu · Shaojie Bai · J. Zico Kolter · Roger Grosse -
2021 : Speaker Intro »
Aviral Kumar · George Tucker -
2021 : Speaker Intro »
Aviral Kumar · George Tucker -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 Workshop: Offline Reinforcement Learning »
Rishabh Agarwal · Aviral Kumar · George Tucker · Justin Fu · Nan Jiang · Doina Precup · Aviral Kumar -
2021 : NLP with Synthetic Text »
Mohammad Norouzi -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Q&A »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 Poster: Why Do Better Loss Functions Lead to Less Transferable Features? »
Simon Kornblith · Ting Chen · Honglak Lee · Mohammad Norouzi -
2021 Poster: Coupled Gradient Estimators for Discrete Latent Variables »
Zhe Dong · Andriy Mnih · George Tucker -
2021 Poster: Differentiable Annealed Importance Sampling and the Perils of Gradient Noise »
Guodong Zhang · Kyle Hsu · Jianing Li · Chelsea Finn · Roger Grosse -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 : Invited Talk: Roger Grosse - Why Isn’t Everyone Using Second-Order Optimization? »
Roger Grosse -
2020 Workshop: Offline Reinforcement Learning »
Aviral Kumar · Rishabh Agarwal · George Tucker · Lihong Li · Doina Precup · Aviral Kumar -
2020 : Introduction »
Aviral Kumar · George Tucker · Rishabh Agarwal -
2020 : Poster Session 3 (gather.town) »
Denny Wu · Chengrun Yang · Tolga Ergen · sanae lotfi · Charles Guille-Escuret · Boris Ginsburg · Hanbake Lyu · Cong Xie · David Newton · Debraj Basu · Yewen Wang · James Lucas · MAOJIA LI · Lijun Ding · Jose Javier Gonzalez Ortiz · Reyhane Askari Hemmat · Zhiqi Bu · Neal Lawton · Kiran Thekumparampil · Jiaming Liang · Lindon Roberts · Jingyi Zhu · Dongruo Zhou -
2020 Poster: Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians »
Juhan Bae · Roger Grosse -
2020 Poster: Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards »
Yijie Guo · Jongwook Choi · Marcin Moczulski · Shengyu Feng · Samy Bengio · Mohammad Norouzi · Honglak Lee -
2020 Poster: Regularized linear autoencoders recover the principal components, eventually »
Xuchan Bao · James Lucas · Sushant Sachdeva · Roger Grosse -
2020 Poster: Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation »
Sajad Norouzi · David Fleet · Mohammad Norouzi -
2020 Poster: RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning »
Caglar Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas -
2020 Poster: Big Self-Supervised Models are Strong Semi-Supervised Learners »
Ting Chen · Simon Kornblith · Kevin Swersky · Mohammad Norouzi · Geoffrey E Hinton -
2020 Poster: DisARM: An Antithetic Gradient Estimator for Binary Latent Variables »
Zhe Dong · Andriy Mnih · George Tucker -
2020 Spotlight: DisARM: An Antithetic Gradient Estimator for Binary Latent Variables »
Zhe Dong · Andriy Mnih · George Tucker -
2020 Poster: Conservative Q-Learning for Offline Reinforcement Learning »
Aviral Kumar · Aurick Zhou · George Tucker · Sergey Levine -
2020 : Policy Panel »
Roya Pakzad · Dia Kayyali · Marzyeh Ghassemi · Shakir Mohamed · Mohammad Norouzi · Ted Pedersen · Anver Emon · Abubakar Abid · Darren Byler · Samhaa R. El-Beltagy · Nayel Shafei · Mona Diab -
2020 Affinity Workshop: Muslims in ML »
Marzyeh Ghassemi · Mohammad Norouzi · Shakir Mohamed · Aya Salama · Tasmie Sarker -
2019 : James Lucas, "Information-theoretic limitations on novel task generalization" »
James Lucas -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 Poster: Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks »
Guodong Zhang · James Martens · Roger Grosse -
2019 Poster: Lookahead Optimizer: k steps forward, 1 step back »
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton -
2019 Poster: Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model »
Guodong Zhang · Lala Li · Zachary Nado · James Martens · Sushant Sachdeva · George Dahl · Chris Shallue · Roger Grosse -
2019 Poster: Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks »
Qiyang Li · Saminul Haque · Cem Anil · James Lucas · Roger Grosse · Joern-Henrik Jacobsen -
2019 Poster: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction »
Aviral Kumar · Justin Fu · George Tucker · Sergey Levine -
2019 Poster: Energy-Inspired Models: Learning with Sampler-Induced Distributions »
Dieterich Lawson · George Tucker · Bo Dai · Rajesh Ranganath -
2018 Poster: Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning »
Supasorn Suwajanakorn · Noah Snavely · Jonathan Tompson · Mohammad Norouzi -
2018 Poster: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion »
Jacob Buckman · Danijar Hafner · George Tucker · Eugene Brevdo · Honglak Lee -
2018 Oral: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion »
Jacob Buckman · Danijar Hafner · George Tucker · Eugene Brevdo · Honglak Lee -
2018 Oral: Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning »
Supasorn Suwajanakorn · Noah Snavely · Jonathan Tompson · Mohammad Norouzi -
2018 Poster: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Poster: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2018 Oral: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Spotlight: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse -
2017 Poster: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans -
2017 Poster: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Spotlight: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Oral: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Poster: Filtering Variational Objectives »
Chris Maddison · John Lawson · George Tucker · Nicolas Heess · Mohammad Norouzi · Andriy Mnih · Arnaud Doucet · Yee Teh -
2017 Poster: The Reversible Residual Network: Backpropagation Without Storing Activations »
Aidan Gomez · Mengye Ren · Raquel Urtasun · Roger Grosse -
2016 Symposium: Deep Learning Symposium »
Yoshua Bengio · Yann LeCun · Navdeep Jaitly · Roger Grosse -
2016 Poster: Measuring the reliability of MCMC inference with bidirectional Monte Carlo »
Roger Grosse · Siddharth Ancha · Daniel Roy -
2016 Poster: Reward Augmented Maximum Likelihood for Neural Structured Prediction »
Mohammad Norouzi · Samy Bengio · zhifeng Chen · Navdeep Jaitly · Mike Schuster · Yonghui Wu · Dale Schuurmans -
2015 Poster: Efficient Non-greedy Optimization of Decision Trees »
Mohammad Norouzi · Maxwell Collins · Matthew A Johnson · David Fleet · Pushmeet Kohli -
2015 Poster: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2015 Spotlight: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2013 Poster: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2013 Oral: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2012 Poster: Hamming Distance Metric Learning »
Mohammad Norouzi · Russ Salakhutdinov · David Fleet