Timezone: »
In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space. This work focuses on the representation learning question: how can we learn such features? Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem. Structurally, we make precise connections between these low rank MDPs and latent variable models, showing how they significantly generalize prior formulations, such as block MDPs, for representation learning in RL. Algorithmically, we develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models. On a technical level, our analysis eliminates reachability assumptions that appear in prior results on the simpler block MDP model and may be of independent interest.
Author Information
Alekh Agarwal (Microsoft Research)
Sham Kakade (University of Washington & Microsoft Research)
Akshay Krishnamurthy (Microsoft)
Wen Sun (Cornell University)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Oral: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Tue Dec 8th 02:30 -- 02:45 PM Room Orals & Spotlights: Reinforcement Learning
More from the Same Authors
-
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Sham M Kakade · Martha White · Nicolas Le Roux -
2020 Poster: Provably adaptive reinforcement learning in metric spaces »
Tongyi Cao · Akshay Krishnamurthy -
2020 Poster: Robust Meta-learning for Mixed Linear Regression with Small Batches »
Weihao Kong · Raghav Somani · Sham Kakade · Sewoong Oh -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Policy Improvement via Imitation of Multiple Oracles »
Ching-An Cheng · Andrey Kolobov · Alekh Agarwal -
2020 Spotlight: Policy Improvement via Imitation of Multiple Oracles »
Ching-An Cheng · Andrey Kolobov · Alekh Agarwal -
2020 Poster: Efficient Contextual Bandits with Continuous Actions »
Maryam Majzoubi · Chicheng Zhang · Rajan Chari · Akshay Krishnamurthy · John Langford · Aleksandrs Slivkins -
2020 Poster: Learning the Linear Quadratic Regulator from Nonlinear Observations »
Zakaria Mhammedi · Dylan Foster · Max Simchowitz · Dipendra Misra · Wen Sun · Akshay Krishnamurthy · Alexander Rakhlin · John Langford -
2020 Poster: PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning »
Alekh Agarwal · Mikael Henaff · Sham Kakade · Wen Sun -
2020 Poster: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Poster: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Spotlight: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Spotlight: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Poster: Safe Reinforcement Learning via Curriculum Induction »
Matteo Turchetta · Andrey Kolobov · Shital Shah · Andreas Krause · Alekh Agarwal -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Poster: Provably Good Batch Reinforcement Learning Without Great Exploration »
Yao Liu · Adith Swaminathan · Alekh Agarwal · Emma Brunskill -
2020 Poster: Information Theoretic Regret Bounds for Online Nonlinear Control »
Sham Kakade · Akshay Krishnamurthy · Kendall Lowrey · Motoya Ohnishi · Wen Sun -
2020 Spotlight: Safe Reinforcement Learning via Curriculum Induction »
Matteo Turchetta · Andrey Kolobov · Shital Shah · Andreas Krause · Alekh Agarwal -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning »
Sham M Kakade · Martha White · Nicolas Le Roux -
2019 Poster: Sample Complexity of Learning Mixture of Sparse Linear Regressions »
Akshay Krishnamurthy · Arya Mazumdar · Andrew McGregor · Soumyabrata Pal -
2019 Poster: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares »
Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli -
2019 Poster: Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting »
Aditya Grover · Jiaming Song · Ashish Kapoor · Kenneth Tran · Alekh Agarwal · Eric Horvitz · Stefano Ermon -
2019 Poster: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Spotlight: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Poster: Meta-Learning with Implicit Gradients »
Aravind Rajeswaran · Chelsea Finn · Sham Kakade · Sergey Levine -
2018 Poster: A Smoother Way to Train Structured Prediction Models »
Krishna Pillutla · Vincent Roulet · Sham Kakade · Zaid Harchaoui -
2018 Poster: Contextual bandits with surrogate losses: Margin bounds and efficient algorithms »
Dylan Foster · Akshay Krishnamurthy -
2018 Poster: Dual Policy Iteration »
Wen Sun · Geoffrey Gordon · Byron Boots · J. Bagnell -
2018 Poster: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2018 Spotlight: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2018 Poster: Provably Correct Automatic Sub-Differentiation for Qualified Programs »
Sham Kakade · Jason Lee -
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht -
2017 Poster: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2017 Oral: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant -
2017 Poster: Predictive-State Decoders: Encoding the Future into Recurrent Networks »
Arun Venkatraman · Nicholas Rhinehart · Wen Sun · Lerrel Pinto · Martial Hebert · Byron Boots · Kris Kitani · J. Bagnell -
2017 Poster: Towards Generalization and Simplicity in Continuous Control »
Aravind Rajeswaran · Kendall Lowrey · Emanuel Todorov · Sham Kakade -
2016 Poster: Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent »
Chi Jin · Sham Kakade · Praneeth Netrapalli -
2016 Demonstration: Project Malmo - Minecraft for AI Research »
Katja Hofmann · Matthew A Johnson · Fernando Diaz · Alekh Agarwal · Tim Hutton · David Bignell · Evelyne Viegas -
2016 Poster: Efficient Second Order Online Learning by Sketching »
Haipeng Luo · Alekh Agarwal · Nicolò Cesa-Bianchi · John Langford -
2016 Poster: Contextual semibandits via supervised learning oracles »
Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik -
2016 Poster: PAC Reinforcement Learning with Rich Observations »
Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2015 Workshop: Optimization for Machine Learning (OPT2015) »
Suvrit Sra · Alekh Agarwal · Leon Bottou · Sashank J. Reddi -
2015 Poster: Efficient and Parsimonious Agnostic Active Learning »
Tzu-Kuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire -
2015 Spotlight: Efficient and Parsimonious Agnostic Active Learning »
Tzu-Kuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire -
2015 Poster: Convergence Rates of Active Learning for Maximum Likelihood Estimation »
Kamalika Chaudhuri · Sham Kakade · Praneeth Netrapalli · Sujay Sanghavi -
2015 Poster: Super-Resolution Off the Grid »
Qingqing Huang · Sham Kakade -
2015 Poster: Fast Convergence of Regularized Learning in Games »
Vasilis Syrgkanis · Alekh Agarwal · Haipeng Luo · Robert Schapire -
2015 Oral: Fast Convergence of Regularized Learning in Games »
Vasilis Syrgkanis · Alekh Agarwal · Haipeng Luo · Robert Schapire -
2015 Spotlight: Super-Resolution Off the Grid »
Qingqing Huang · Sham Kakade -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: Scalable Non-linear Learning with Adaptive Polynomial Expansions »
Alekh Agarwal · Alina Beygelzimer · Daniel Hsu · John Langford · Matus J Telgarsky -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2013 Workshop: OPT2013: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade -
2012 Workshop: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade -
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu -
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions »
Alekh Agarwal · Sahand N Negahban · Martin J Wainwright -
2011 Workshop: Computational Trade-offs in Statistical Learning »
Alekh Agarwal · Sasha Rakhlin -
2011 Poster: Distributed Delayed Stochastic Optimization »
Alekh Agarwal · John Duchi -
2011 Poster: Stochastic convex optimization with bandit feedback »
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin -
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang -
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir -
2010 Workshop: Learning on Cores, Clusters, and Clouds »
Alekh Agarwal · Lawrence Cayton · Ofer Dekel · John Duchi · John Langford -
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade -
2010 Spotlight: Distributed Dual Averaging In Networks »
John Duchi · Alekh Agarwal · Martin J Wainwright -
2010 Poster: Distributed Dual Averaging In Networks »
John Duchi · Alekh Agarwal · Martin J Wainwright -
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade -
2010 Oral: Fast global convergence rates of gradient methods for high-dimensional statistical recovery »
Alekh Agarwal · Sahand N Negahban · Martin J Wainwright -
2010 Poster: Fast global convergence rates of gradient methods for high-dimensional statistical recovery »
Alekh Agarwal · Sahand N Negahban · Martin J Wainwright -
2009 Poster: Information-theoretic lower bounds on the oracle complexity of convex optimization »
Alekh Agarwal · Peter Bartlett · Pradeep Ravikumar · Martin J Wainwright -
2009 Spotlight: Information-theoretic lower bounds on the oracle complexity of convex optimization »
Alekh Agarwal · Peter Bartlett · Pradeep Ravikumar · Martin J Wainwright -
2009 Poster: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2009 Oral: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2008 Poster: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization »
Sham M Kakade · Karthik Sridharan · Ambuj Tewari -
2007 Poster: An Analysis of Inference with the Universum »
Fabian H Sinz · Olivier Chapelle · Alekh Agarwal · Bernhard Schölkopf -
2007 Spotlight: An Analysis of Inference with the Universum »
Fabian H Sinz · Olivier Chapelle · Alekh Agarwal · Bernhard Schölkopf -
2007 Oral: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade -
2007 Poster: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade