Timezone: »
Poster
Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs
Gellért Weisz · András György · Tadashi Kozuno · Csaba Szepesvari
We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies. This improvement over API (whose error scales with $H^2$) comes at the price of an $H$-fold increase in memory cost. Unlike Scherrer and Lesner [2012], who recommended computing a non-stationary policy to achieve a similar improvement (with the same memory overhead), we are able to stick to stationary policies. This allows for our second contribution, the application of CAPI to planning with local access to a simulator and $d$-dimensional linear function approximation. As such, we design a planning algorithm that applies CAPI to obtain a sequence of policies with successively refined accuracies on a dynamically evolving set of states. The algorithm outputs an $\tilde O(\sqrt{d}H\epsilon)$-optimal policy after issuing $\tilde O(dH^4/\epsilon^2)$ queries to the simulator, simultaneously achieving the optimal accuracy bound and the best known query complexity bound, while earlier algorithms in the literature achieve only one of them. This query complexity is shown to be tight in all parameters except $H$. These improvements come at the expense of a mild (polynomial) increase in memory and computational costs of both the algorithm and its output policy.
Author Information
Gellért Weisz (University College London)
András György (DeepMind)
Tadashi Kozuno (Omron Sinic X)
Csaba Szepesvari (University of Alberta)
More from the Same Authors
-
2021 : Towards Better Visual Explanations for Deep ImageClassifiers »
Agnieszka Grabska-Barwinska · Amal Rannen-Triki · Omar Rivasplata · András György -
2022 : Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2022 Poster: Sample-Efficient Reinforcement Learning of Partially Observable Markov Games »
Qinghua Liu · Csaba Szepesvari · Chi Jin -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2022 Poster: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization »
Hui Yuan · Chengzhuo Ni · Huazheng Wang · Xuezhou Zhang · Le Cong · Csaba Szepesvari · Mengdi Wang -
2021 : [S14] Towards Better Visual Explanations for Deep ImageClassifiers »
Agnieszka Grabska-Barwinska · Amal Rannen-Triki · Omar Rivasplata · András György -
2020 Poster: ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool »
Gellert Weisz · András György · Wei-I Lin · Devon Graham · Kevin Leyton-Brown · Csaba Szepesvari · Brendan Lucier -
2019 Poster: Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging »
Pooria Joulani · András György · Csaba Szepesvari -
2019 Poster: Detecting Overfitting via Adversarial Examples »
Roman Werpachowski · András György · Csaba Szepesvari -
2016 Poster: Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities »
Ruitong Huang · Tor Lattimore · András György · Csaba Szepesvari -
2016 Poster: SDP Relaxation with Randomized Rounding for Energy Disaggregation »
Kiarash Shaloudegi · András György · Csaba Szepesvari · Wilsun Xu -
2016 Oral: SDP Relaxation with Randomized Rounding for Energy Disaggregation »
Kiarash Shaloudegi · András György · Csaba Szepesvari · Wilsun Xu -
2015 Poster: Online Learning with Gaussian Payoffs and Side Observations »
Yifan Wu · András György · Csaba Szepesvari -
2013 Poster: Online Learning with Costly Features and Labels »
Navid Zolghadr · Gábor Bartók · Russell Greiner · András György · Csaba Szepesvari -
2010 Spotlight: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · András Antos · Csaba Szepesvari -
2010 Poster: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · Csaba Szepesvari · András Antos