Timezone: »
Poster
Improving PAC Exploration Using the Median Of Means
Jason Pazis · Ronald Parr · Jonathan How
We present the first application of the median of means in a PAC exploration algorithm for MDPs. Using the median of means allows us to significantly reduce the dependence of our bounds on the range of values that the value function can take, while introducing a dependence on the (potentially much smaller) variance of the Bellman operator. Additionally, our algorithm is the first algorithm with PAC bounds that can be applied to MDPs with unbounded rewards.
Author Information
Jason Pazis (MIT)
Ronald Parr (Duke University)
Jonathan How (MIT)
More from the Same Authors
-
2023 Poster: A Path to Simpler Models Starts With Noise »
Lesia Semenova · Harry Chen · Ronald Parr · Cynthia Rudin -
2022 Poster: Influencing Long-Term Behavior in Multiagent Reinforcement Learning »
Dong-Ki Kim · Matthew Riemer · Miao Liu · Jakob Foerster · Michael Everett · Chuangchuang Sun · Gerald Tesauro · Jonathan How -
2016 Poster: Linear Feature Encoding for Reinforcement Learning »
Zhao Song · Ronald Parr · Xuejun Liao · Lawrence Carin -
2015 Poster: Streaming, Distributed Variational Inference for Bayesian Nonparametrics »
Trevor Campbell · Julian Straub · John Fisher III · Jonathan How -
2013 Workshop: New Directions in Transfer and Multi-Task: Learning Across Domains and Tasks »
Urun Dogan · Marius Kloft · Tatiana Tommasi · Francesco Orabona · Massimiliano Pontil · Sinno Jialin Pan · Shai Ben-David · Arthur Gretton · Fei Sha · Marco Signoretto · Rajhans Samdani · Yun-Qian Miao · Mohammad Gheshlaghi azar · Ruth Urner · Christoph Lampert · Jonathan How -
2013 Workshop: Advances in Machine Learning for Sensorimotor Control »
Thomas Walsh · Alborz Geramifard · Marc Deisenroth · Jonathan How · Jan Peters -
2013 Poster: Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture »
Trevor Campbell · Miao Liu · Brian Kulis · Jonathan How · Lawrence Carin -
2013 Poster: Sensor Selection in High-Dimensional Gaussian Trees with Nuisances »
Daniel S Levine · Jonathan How -
2012 Workshop: Bayesian Nonparametric Models For Reliable Planning And Decision-Making Under Uncertainty »
Jonathan How · Lawrence Carin · John Fisher III · Michael Jordan · Alborz Geramifard -
2010 Oral: Linear Complementarity for Regularized Policy Evaluation and Improvement »
Jeff Johns · Christopher Painter-Wakefield · Ronald Parr -
2010 Poster: Linear Complementarity for Regularized Policy Evaluation and Improvement »
Jeff Johns · Christopher Painter-Wakefield · Ronald Parr