Timezone: »
Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [1], AlphaGo-Zero from [2]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved with guidance from the reactive policy. In this work we study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes.
Author Information
Wen Sun (Carnegie Mellon University)
Geoffrey Gordon (MSR Montréal & CMU)
Dr. Gordon is an Associate Research Professor in the Department of Machine Learning at Carnegie Mellon University, and co-director of the Department's Ph. D. program. He works on multi-robot systems, statistical machine learning, game theory, and planning in probabilistic, adversarial, and general-sum domains. His previous appointments include Visiting Professor at the Stanford Computer Science Department and Principal Scientist at Burning Glass Technologies in San Diego. Dr. Gordon received his B.A. in Computer Science from Cornell University in 1991, and his Ph.D. in Computer Science from Carnegie Mellon University in 1999.
Byron Boots (Georgia Tech / Google Brain)
J. Bagnell (Carnegie Mellon University)
More from the Same Authors
-
2022 : Hybrid RL: Using both offline and online data can make RL efficient »
Yuda Song · Yifei Zhou · Ayush Sekhari · J. Bagnell · Akshay Krishnamurthy · Wen Sun -
2022 Poster: Sequence Model Imitation Learning with Unobserved Contexts »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2022 Poster: Minimax Optimal Online Imitation Learning via Replay Estimation »
Gokul Swamy · Nived Rajaraman · Matt Peng · Sanjiban Choudhury · J. Bagnell · Steven Wu · Jiantao Jiao · Kannan Ramchandran -
2021 Poster: Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 Poster: MobILE: Model-Based Imitation Learning From Observation Alone »
Rahul Kidambi · Jonathan Chang · Wen Sun -
2020 Poster: Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation »
Han Zhao · Jianfeng Chi · Yuan Tian · Geoffrey Gordon -
2020 Poster: Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift »
Remi Tachet des Combes · Han Zhao · Yu-Xiang Wang · Geoffrey Gordon -
2020 Poster: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning »
Alekh Agarwal · Mikael Henaff · Sham Kakade · Wen Sun -
2020 Poster: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Spotlight: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Oral: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: Information Theoretic Regret Bounds for Online Nonlinear Control »
Sham Kakade · Akshay Krishnamurthy · Kendall Lowrey · Motoya Ohnishi · Wen Sun -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Learning Neural Networks with Adaptive Regularization »
Han Zhao · Yao-Hung Hubert Tsai · Russ Salakhutdinov · Geoffrey Gordon -
2019 Poster: Towards modular and programmable architecture search »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon · Darshan Patil · Nghia Le · Daniel Ferreira -
2018 : Drew Bagnell / Wen Sun »
James Bagnell · Wen Sun -
2018 : Byron Boots »
Byron Boots -
2018 : Coffee Break and Poster Session I »
Pim de Haan · Bin Wang · Dequan Wang · Aadil Hayat · Ibrahim Sobh · Muhammad Asif Rana · Thibault Buhet · Nicholas Rhinehart · Arjun Sharma · Alex Bewley · Michael Kelly · Lionel Blondé · Ozgur S. Oguz · Vaibhav Viswanathan · Jeroen Vanbaar · Konrad Żołna · Negar Rostamzadeh · Rowan McAllister · Sanjay Thakur · Alexandros Kalousis · Chelsea Sidrane · Sujoy Paul · Daphne Chen · Michal Garmulewicz · Henryk Michalewski · Coline Devin · Hongyu Ren · Jiaming Song · Wen Sun · Hanzhang Hu · Wulong Liu · Emilie Wirbel -
2018 Poster: Differentiable MPC for End-to-end Planning and Control »
Brandon Amos · Ivan Jimenez · Jacob I Sacks · Byron Boots · J. Zico Kolter -
2018 Poster: Learning Beam Search Policies via Imitation Learning »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon -
2018 Poster: Learning and Inference in Hilbert Space with Quantum Graphical Models »
Siddarth Srinivasan · Carlton Downey · Byron Boots -
2018 Poster: Orthogonally Decoupled Variational Gaussian Processes »
Hugh Salimbeni · Ching-An Cheng · Byron Boots · Marc Deisenroth -
2018 Poster: Adversarial Multiple Source Domain Adaptation »
Han Zhao · Shanghang Zhang · Guanhang Wu · José M. F. Moura · Joao P Costeira · Geoffrey Gordon -
2017 Poster: Linear Time Computation of Moments in Sum-Product Networks »
Han Zhao · Geoffrey Gordon -
2017 Poster: Predictive-State Decoders: Encoding the Future into Recurrent Networks »
Arun Venkatraman · Nicholas Rhinehart · Wen Sun · Lerrel Pinto · Martial Hebert · Byron Boots · Kris Kitani · J. Bagnell -
2017 Poster: Variational Inference for Gaussian Process Models with Linear Complexity »
Ching-An Cheng · Byron Boots -
2017 Poster: Predictive State Recurrent Neural Networks »
Carlton Downey · Ahmed Hefny · Byron Boots · Geoffrey Gordon · Boyue Li -
2016 Poster: A Unified Approach for Learning the Parameters of Sum-Product Networks »
Han Zhao · Pascal Poupart · Geoffrey Gordon -
2016 Poster: Incremental Variational Sparse Gaussian Process Regression »
Ching-An Cheng · Byron Boots -
2015 Poster: Supervised Learning for Dynamical System Learning »
Ahmed Hefny · Carlton Downey · Geoffrey Gordon -
2014 Session: Oral Session 7 »
Geoffrey Gordon -
2012 Tutorial: Machine Learning for Student Learning »
Emma Brunskill · Geoffrey Gordon -
2010 Poster: Predictive State Temporal Difference Learning »
Byron Boots · Geoffrey Gordon -
2007 Oral: A Constraint Generation Approach to Learning Stable Linear Dynamical Systems »
Sajid M Siddiqi · Byron Boots · Geoffrey Gordon -
2007 Poster: A Constraint Generation Approach to Learning Stable Linear Dynamical Systems »
Sajid M Siddiqi · Byron Boots · Geoffrey Gordon -
2006 Poster: No-regret algorithms for Online Convex Programs »
Geoffrey Gordon -
2006 Talk: No-regret algorithms for Online Convex Programs »
Geoffrey Gordon -
2006 Poster: Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General Sum Stochastic Games »
Chris D Murray · Geoffrey Gordon