Timezone: »
While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.
Author Information
Osbert Bastani (University of Pennsylvania)
Yewen Pu (MIT)
Armando Solar-Lezama (MIT)
More from the Same Authors
-
2021 Spotlight: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments »
Yichen Yang · Jeevana Priya Inala · Osbert Bastani · Yewen Pu · Armando Solar-Lezama · Martin Rinard -
2021 : AutumnSynth: Synthesis of Reactive Programs with Structured Latent State »
Ria Das · Zenna Tavares · Josh Tenenbaum · Armando Solar-Lezama -
2021 : PAC Synthesis of Machine Learning Programs »
Osbert Bastani -
2021 : Synthesizing Video Trajectory Queries »
Stephen Mell · Favyen Bastani · Stephan Zdancewic · Osbert Bastani -
2021 : Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning »
Jason Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman -
2021 : Synthesis of Reactive Programs with Structured Latent State »
Ria Das · Zenna Tavares · Armando Solar-Lezama · Josh Tenenbaum -
2022 : Neurosymbolic Programming for Science »
Jennifer J Sun · Megan Tjandrasuwita · Atharva Sehgal · Armando Solar-Lezama · Swarat Chaudhuri · Yisong Yue · Omar Costilla Reyes -
2022 : Lemma: Bootstrapping High-Level Mathematical Reasoning with Learned Symbolic Abstractions »
Zhening Li · Gabriel Poesia Reis e Silva · Omar Costilla Reyes · Noah Goodman · Armando Solar-Lezama -
2022 : Q & A »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 Tutorial: Neurosymbolic Programming »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 : Neurosymbolic Programming »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 : Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark »
Vitali Petsiuk · Alexander E. Siemenn · Saisamrit Surbehera · Qi Qi Chin · Keith Tyser · Gregory Hunter · Arvind Raghavan · Yann Hicke · Bryan Plummer · Ori Kerret · Tonio Buonassisi · Kate Saenko · Armando Solar-Lezama · Iddo Drori -
2021 Poster: Conservative Offline Distributional Reinforcement Learning »
Jason Yecheng Ma · Dinesh Jayaraman · Osbert Bastani -
2021 Poster: Compositional Reinforcement Learning from Logical Specifications »
Kishor Jothimurugan · Suguman Bansal · Osbert Bastani · Rajeev Alur -
2021 Poster: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments »
Yichen Yang · Jeevana Priya Inala · Osbert Bastani · Yewen Pu · Armando Solar-Lezama · Martin Rinard -
2021 Poster: Learning Models for Actionable Recourse »
Alexis Ross · Himabindu Lakkaraju · Osbert Bastani -
2020 : Invited Talk (Armando Solar-Lezama) »
Armando Solar-Lezama -
2020 Workshop: Workshop on Computer Assisted Programming (CAP) »
Augustus Odena · Charles Sutton · Nadia Polikarpova · Josh Tenenbaum · Armando Solar-Lezama · Isil Dillig -
2020 Poster: Program Synthesis with Pragmatic Communication »
Yewen Pu · Kevin Ellis · Marta Kryven · Josh Tenenbaum · Armando Solar-Lezama -
2020 Poster: Learning Compositional Rules via Neural Program Synthesis »
Maxwell Nye · Armando Solar-Lezama · Josh Tenenbaum · Brenden Lake -
2020 Poster: Neurosymbolic Transformers for Multi-Agent Communication »
Jeevana Priya Inala · Yichen Yang · James Paulos · Yewen Pu · Osbert Bastani · Vijay Kumar · Martin Rinard · Armando Solar-Lezama -
2019 Poster: Write, Execute, Assess: Program Synthesis with a REPL »
Kevin Ellis · Maxwell Nye · Yewen Pu · Felix Sosa · Josh Tenenbaum · Armando Solar-Lezama -
2019 Poster: Compiler Auto-Vectorization with Imitation Learning »
Charith Mendis · Cambridge Yang · Yewen Pu · Saman Amarasinghe · Michael Carbin -
2018 Poster: Learning to Infer Graphics Programs from Hand-Drawn Images »
Kevin Ellis · Daniel Ritchie · Armando Solar-Lezama · Josh Tenenbaum -
2018 Poster: Learning Libraries of Subroutines for Neurally–Guided Bayesian Program Induction »
Kevin Ellis · Lucas Morales · Mathias Sablé-Meyer · Armando Solar-Lezama · Josh Tenenbaum -
2018 Spotlight: Learning to Infer Graphics Programs from Hand-Drawn Images »
Kevin Ellis · Daniel Ritchie · Armando Solar-Lezama · Josh Tenenbaum -
2018 Spotlight: Learning Libraries of Subroutines for Neurally–Guided Bayesian Program Induction »
Kevin Ellis · Lucas Morales · Mathias Sablé-Meyer · Armando Solar-Lezama · Josh Tenenbaum -
2018 Poster: Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections »
Xin Zhang · Armando Solar-Lezama · Rishabh Singh -
2016 Poster: Sampling for Bayesian Program Learning »
Kevin Ellis · Armando Solar-Lezama · Josh Tenenbaum -
2015 Poster: Unsupervised Learning by Program Synthesis »
Kevin Ellis · Armando Solar-Lezama · Josh Tenenbaum