Timezone: »
We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches. Third, we cast the projection step as program synthesis via imitation learning, and exploit contemporary combinatorial methods for this task. We present theoretical convergence results for PROPEL and empirically evaluate the approach in three continuous control domains. The experiments show that PROPEL can significantly outperform state-of-the-art approaches for learning programmatic policies.
Author Information
Abhinav Verma (Rice University)
Hoang Le (California Institute of Technology)
Yisong Yue (Caltech)
Swarat Chaudhuri (Rice University)
More from the Same Authors
-
2020 Workshop: Learning Meets Combinatorial Algorithms »
Marin Vlastelica · Jialin Song · Aaron Ferber · Brandon Amos · Georg Martius · Bistra Dilkina · Yisong Yue -
2020 Poster: Online Optimization with Memory and Competitive Control »
Guanya Shi · Yiheng Lin · Soon-Jo Chung · Yisong Yue · Adam Wierman -
2020 Poster: A General Large Neighborhood Search Framework for Solving Integer Linear Programs »
Jialin Song · ravi lanka · Yisong Yue · Bistra Dilkina -
2020 Poster: Learning compositional functions via multiplicative weight updates »
Jeremy Bernstein · Jiawei Zhao · Markus Meister · Ming-Yu Liu · Anima Anandkumar · Yisong Yue -
2020 Poster: Learning Differentiable Programs with Admissible Neural Heuristics »
Ameesh Shah · Eric Zhan · Jennifer Sun · Abhinav Verma · Yisong Yue · Swarat Chaudhuri -
2020 Poster: Neurosymbolic Reinforcement Learning with Formally Verified Exploration »
Greg Anderson · Abhinav Verma · Isil Dillig · Swarat Chaudhuri -
2020 Poster: On the distance between two neural networks and the stability of learning »
Jeremy Bernstein · Arash Vahdat · Yisong Yue · Ming-Yu Liu -
2020 Poster: The Power of Predictions in Online Control »
Chenkai Yu · Guanya Shi · Soon-Jo Chung · Yisong Yue · Adam Wierman -
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: NAOMI: Non-Autoregressive Multiresolution Sequence Imputation »
Yukai Liu · Rose Yu · Stephan Zheng · Eric Zhan · Yisong Yue -
2019 Poster: Teaching Multiple Concepts to a Forgetful Learner »
Anette Hunziker · Yuxin Chen · Oisin Mac Aodha · Manuel Gomez Rodriguez · Andreas Krause · Pietro Perona · Yisong Yue · Adish Singla -
2019 Poster: Landmark Ordinal Embedding »
Nikhil Ghosh · Yuxin Chen · Yisong Yue -
2018 Poster: Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners »
Yuxin Chen · Adish Singla · Oisin Mac Aodha · Pietro Perona · Yisong Yue -
2018 Poster: A General Method for Amortizing Variational Filtering »
Joseph Marino · Milan Cvitkovic · Yisong Yue -
2018 Poster: HOUDINI: Lifelong Learning as Program Synthesis »
Lazar Valkov · Dipak Chaudhari · Akash Srivastava · Charles Sutton · Swarat Chaudhuri -
2016 Poster: Generating Long-term Trajectories Using Deep Hierarchical Networks »
Stephan Zheng · Yisong Yue · Patrick Lucey -
2015 Poster: Smooth Interactive Submodular Set Cover »
Bryan He · Yisong Yue -
2015 Demonstration: Data-Driven Speech Animation »
Yisong Yue · Iain Matthews