Timezone: »
We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic (``best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner's candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best-of-both world (stochastic and adversarial) guarantees while performing model selection in contextual bandit scenarios.
Author Information
Aldo Pacchiano (Microsoft Research)
Christoph Dann (Google Research)
Claudio Gentile (Google Research)
More from the Same Authors
-
2021 Spotlight: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Spotlight: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning »
Christoph Dann · Teodor Vanislavov Marinov · Mehryar Mohri · Julian Zimmert -
2021 Spotlight: Online Active Learning with Surrogate Loss Functions »
Giulia DeSalvo · Claudio Gentile · Tobias Sommer Thune -
2023 Poster: Easy Learning from Label Proportions »
Róbert Busa-Fekete · Heejin Choi · Travis Dick · Claudio Gentile · Andres Munoz Medina -
2023 Poster: A Unified Model and Dimension for Interactive Estimation »
Nataly Brukhim · Miro Dudik · Aldo Pacchiano · Robert Schapire -
2023 Poster: In-Context Decision-Making from Supervised Pretraining »
Jonathan N Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 Poster: Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan N Lee · Emma Brunskill -
2023 Poster: Anytime Model Selection in Linear Bandits »
Parnian Kassraie · Aldo Pacchiano · Nicolas Emmenegger · Andreas Krause -
2022 Poster: Learning General World Models in a Handful of Reward-Free Deployments »
Yingchen Xu · Jack Parker-Holder · Aldo Pacchiano · Philip Ball · Oleh Rybkin · S Roberts · Tim Rocktäschel · Edward Grefenstette -
2022 Poster: Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity »
Abhishek Gupta · Aldo Pacchiano · Yuexiang Zhai · Sham Kakade · Sergey Levine -
2022 Poster: Regret Bounds for Multilabel Classification in Sparse Label Regimes »
Róbert Busa-Fekete · Heejin Choi · Krzysztof Dembczynski · Claudio Gentile · Henry Reeve · Balazs Szorenyi -
2021 Poster: Batch Active Learning at Scale »
Gui Citovsky · Giulia DeSalvo · Claudio Gentile · Lazaros Karydas · Anand Rajagopalan · Afshin Rostamizadeh · Sanjiv Kumar -
2021 Poster: A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning »
Christoph Dann · Mehryar Mohri · Tong Zhang · Julian Zimmert -
2021 Poster: Near Optimal Policy Optimization via REPS »
Aldo Pacchiano · Jonathan N Lee · Peter Bartlett · Ofir Nachum -
2021 Poster: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning »
Christoph Dann · Teodor Vanislavov Marinov · Mehryar Mohri · Julian Zimmert -
2021 Poster: Online Active Learning with Surrogate Loss Functions »
Giulia DeSalvo · Claudio Gentile · Tobias Sommer Thune -
2021 Poster: On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 Poster: Tactical Optimism and Pessimism for Deep Reinforcement Learning »
Ted Moskovitz · Jack Parker-Holder · Aldo Pacchiano · Michael Arbel · Michael Jordan -
2021 Poster: Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Neural Active Learning with Performance Guarantees »
Zhilei Wang · Pranjal Awasthi · Christoph Dann · Ayush Sekhari · Claudio Gentile -
2021 Poster: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Poster: Neural Pseudo-Label Optimism for the Bank Loan Problem »
Aldo Pacchiano · Shaun Singh · Edward Chou · Alex Berg · Jakob Foerster -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Effective Diversity in Population Based Reinforcement Learning »
Jack Parker-Holder · Aldo Pacchiano · Krzysztof M Choromanski · Stephen J Roberts -
2020 Poster: Model Selection in Contextual Stochastic Bandit Problems »
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari -
2020 Spotlight: Effective Diversity in Population Based Reinforcement Learning »
Jack Parker-Holder · Aldo Pacchiano · Krzysztof M Choromanski · Stephen J Roberts -
2020 Poster: Reinforcement Learning with Feedback Graphs »
Christoph Dann · Yishay Mansour · Mehryar Mohri · Ayush Sekhari · Karthik Sridharan -
2019 Poster: Flattening a Hierarchical Clustering through Active Learning »
Fabio Vitale · Anand Rajagopalan · Claudio Gentile -
2019 Poster: From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization »
Krzysztof M Choromanski · Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Vikas Sindhwani -
2018 Poster: Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation »
Kush Bhatia · Aldo Pacchiano · Nicolas Flammarion · Peter Bartlett · Michael Jordan -
2018 Poster: Online Reciprocal Recommendation with Theoretical Performance Guarantees »
Claudio Gentile · Nikos Parotsidis · Fabio Vitale -
2018 Poster: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Spotlight: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Poster: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2018 Spotlight: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Spotlight: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2016 Poster: (Withdrawn)Only H is left: Near-tight Episodic PAC RL »
Christoph Dann · Emma Brunskill -
2015 Poster: Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning »
Christoph Dann · Emma Brunskill -
2015 Poster: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing -
2015 Spotlight: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing