Timezone: »
Adding momentum into Riemannian optimization is computationally challenging due to the intractable ODEs needed to define the exponential and parallel transport maps. We address these issues for Gaussian Fisher-Rao manifolds by proposing new local coordinates to exploit sparse structures and efficiently approximate the ODEs, which results in a numerically stable update scheme. Our approach extends the structured natural-gradient descent method of Lin et al. (2021a) by incorporating momentum into it and scaling the method for large-scale applications arising in numerical optimization and deep learning
Author Information
Wu Lin (University of British Columbia)
Valentin Duruisseaux (University of California, San Diego)
Melvin Leok (University of California, San Diego)
Frank Nielsen (Sony Computer Science Laboratories Inc (Tokyo))
Mohammad Emtiyaz Khan (RIKEN)
Emtiyaz Khan (also known as Emti) is a team leader at the RIKEN center for Advanced Intelligence Project (AIP) in Tokyo where he leads the Approximate Bayesian Inference Team. He is also a visiting professor at the Tokyo University of Agriculture and Technology (TUAT). Previously, he was a postdoc and then a scientist at Ecole Polytechnique Fédérale de Lausanne (EPFL), where he also taught two large machine learning courses and received a teaching award. He finished his PhD in machine learning from University of British Columbia in 2012. The main goal of Emti’s research is to understand the principles of learning from data and use them to develop algorithms that can learn like living beings. For the past 10 years, his work has focused on developing Bayesian methods that could lead to such fundamental principles. The approximate Bayesian inference team now continues to use these principles, as well as derive new ones, to solve real-world problems.
Mark Schmidt (University of British Columbia)
More from the Same Authors
-
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Faster Quasi-Newton Methods for Linear Composition Problems »
Betty Shea · Mark Schmidt -
2021 : A Closer Look at Gradient Estimators with Reinforcement Learning as Inference »
Jonathan Lavington · Michael Teng · Mark Schmidt · Frank Wood -
2021 : Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization »
Alexandre Piche · Joseph Marino · Gian Maria Marconi · Valentin Thomas · Chris Pal · Mohammad Emtiyaz Khan -
2021 : An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control »
Nicholas Ioannidis · Jonathan Lavington · Mark Schmidt -
2022 : Can Calibration Improve Sample Prioritization? »
Ganesh Tata · Gautham Krishna Gudur · Gopinath Chennupati · Mohammad Emtiyaz Khan -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2022 : Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint »
Amrutha Varshini Ramesh · Aaron Mishkin · Mark Schmidt -
2022 : Fast Convergence of Random Reshuffling under Interpolation and the Polyak-Łojasiewicz Condition »
Chen Fan · Christos Thrampoulidis · Mark Schmidt -
2023 Poster: Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking »
Frederik Kunstner · Victor Sanches Portella · Mark Schmidt · Nicholas Harvey -
2023 Poster: Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models »
Leonardo Galli · Holger Rauhut · Mark Schmidt -
2023 Poster: BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization »
Chen Fan · Gaspard Choné-Ducasse · Mark Schmidt · Christos Thrampoulidis -
2023 Poster: The Memory-Perturbation Equation: Understanding Model's Sensitivity to Data »
Peter Nickl · Lu Xu · Dharmesh Tailor · Thomas Möllenhoff · Mohammad Emtiyaz Khan -
2022 : Invited Keynote 2 »
Mohammad Emtiyaz Khan · Mohammad Emtiyaz Khan -
2021 Poster: Dual Parameterization of Sparse Variational Gaussian Processes »
Vincent ADAM · Paul Chang · Mohammad Emtiyaz Khan · Arno Solin -
2021 Poster: Knowledge-Adaptation Priors »
Mohammad Emtiyaz Khan · Siddharth Swaroop -
2020 : Closing remarks »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 : Live Q&A with Michael Friedlander (Zoom) »
Mark Schmidt -
2020 : Intro to Invited Speaker 8 »
Mark Schmidt -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Live Q&A with Rachel Ward (Zoom) »
Mark Schmidt -
2020 : Live Q&A with Ashia Wilson (Zoom) »
Mark Schmidt -
2020 : Welcome remarks to Session 3 »
Mark Schmidt -
2020 Workshop: OPT2020: Optimization for Machine Learning »
Courtney Paquette · Mark Schmidt · Sebastian Stich · Quanquan Gu · Martin Takac -
2020 : Welcome event (gather.town) »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 Poster: Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses »
Yihan Zhou · Victor Sanches Portella · Mark Schmidt · Nicholas Harvey -
2019 Poster: Approximate Inference Turns Deep Networks into Gaussian Processes »
Mohammad Emtiyaz Khan · Alexander Immer · Ehsan Abedi · Maciej Korzepa -
2019 Poster: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates »
Sharan Vaswani · Aaron Mishkin · Issam Laradji · Mark Schmidt · Gauthier Gidel · Simon Lacoste-Julien -
2019 Poster: Practical Deep Learning with Bayesian Principles »
Kazuki Osawa · Siddharth Swaroop · Mohammad Emtiyaz Khan · Anirudh Jain · Runa Eschenhagen · Richard Turner · Rio Yokota -
2019 Tutorial: Deep Learning with Bayesian Principles »
Mohammad Emtiyaz Khan -
2018 Poster: SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient »
Aaron Mishkin · Frederik Kunstner · Didrik Nielsen · Mark Schmidt · Mohammad Emtiyaz Khan -
2016 : Fast Patch-based Style Transfer of Arbitrary Style »
Tian Qi Chen · Mark Schmidt -
2015 Poster: Kullback-Leibler Proximal Variational Inference »
Mohammad Emtiyaz Khan · Pierre Baque · François Fleuret · Pascal Fua -
2015 Poster: StopWasting My Gradients: Practical SVRG »
Reza Babanezhad Harikandeh · Mohamed Osama Ahmed · Alim Virani · Mark Schmidt · Jakub Konečný · Scott Sallinen -
2014 Poster: Decoupled Variational Gaussian Inference »
Mohammad Emtiyaz Khan -
2012 Poster: Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression »
Mohammad Emtiyaz Khan · Shakir Mohamed · Kevin Murphy -
2010 Poster: Variational bounds for mixed-data factor analysis »
Mohammad Emtiyaz Khan · Benjamin Marlin · Guillaume Bouchard · Kevin Murphy -
2009 Oral: Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models »
Baback Moghaddam · Benjamin Marlin · Mohammad Emtiyaz Khan · Kevin Murphy -
2009 Poster: Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models »
Baback Moghaddam · Benjamin Marlin · Mohammad Emtiyaz Khan · Kevin Murphy