Timezone: »
The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of new policies from the data collected by another one. OPE is crucial when evaluating a new policy online is too expensive or risky. Many of the state-of-the-art OPE estimators are based on the Inverse Propensity Scoring (IPS) technique, which provides an unbiased estimator when the full support assumption holds, i.e., when the logging policy assigns a non-zero probability to each action. However, there are several scenarios where this assumption does not hold in practice, i.e., there is deficient support, and the IPS estimator is biased in the general case.In this paper, we consider two alternative estimators for the deficient support OPE problem. We first show how to adapt an estimator that was originally proposed for a different domain to the deficient support setting.Then, we propose another estimator, which is a novel contribution of this paper.These estimators exploit additional information about the actions, which we call side information, in order to make reliable estimates on the unsupported actions. Under alternative assumptions that do not require full support, we show that the considered estimators are unbiased.We also provide a theoretical analysis of the concentration when relaxing all the assumptions. Finally, we provide an experimental evaluation showing how the considered estimators are better suited for the deficient support setting compared to the baselines.
Author Information
Nicolò Felicioni (Politecnico di Milano)
Maurizio Ferrari Dacrema (Polytechnic Institute of Milan)
Marcello Restelli (Politecnico di Milano)
Paolo Cremonesi (Politecnico di Milano)
More from the Same Authors
-
2021 Spotlight: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2021 : Policy Optimization via Optimal Policy Evaluation »
Alberto Maria Metelli · Samuele Meta · Marcello Restelli -
2022 : Multi-Armed Bandit Problem with Temporally-Partitioned Rewards »
Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli -
2022 : Provably Efficient Causal Model-Based Reinforcement Learning for Environment-Agnostic Generalization »
Mirco Mutti · Riccardo De Santi · Emanuele Rossi · Juan Calderon · Michael Bronstein · Marcello Restelli -
2022 Poster: Multi-Fidelity Best-Arm Identification »
Riccardo Poiani · Alberto Maria Metelli · Marcello Restelli -
2022 Poster: Challenging Common Assumptions in Convex Reinforcement Learning »
Mirco Mutti · Riccardo De Santi · Piersilvio De Bartolomeis · Marcello Restelli -
2021 Poster: Learning in Non-Cooperative Configurable Markov Decision Processes »
Giorgia Ramponi · Alberto Maria Metelli · Alessandro Concetti · Marcello Restelli -
2021 Poster: Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2020 Poster: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits »
Andrea Tirinzoni · Matteo Pirotta · Marcello Restelli · Alessandro Lazaric -
2020 Poster: Inverse Reinforcement Learning from a Gradient-based Learner »
Giorgia Ramponi · Gianluca Drappo · Marcello Restelli -
2020 Session: Orals & Spotlights Track 31: Reinforcement Learning »
Dotan Di Castro · Marcello Restelli -
2019 Poster: Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters »
Alberto Maria Metelli · Amarildo Likmeta · Marcello Restelli -
2018 Poster: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2018 Poster: Transfer of Value Functions via Variational Methods »
Andrea Tirinzoni · Rafael Rodriguez Sanchez · Marcello Restelli -
2018 Oral: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2017 Poster: Compatible Reward Inverse Reinforcement Learning »
Alberto Maria Metelli · Matteo Pirotta · Marcello Restelli -
2017 Poster: Adaptive Batch Size for Safe Policy Gradients »
Matteo Papini · Matteo Pirotta · Marcello Restelli -
2014 Poster: Sparse Multi-Task Reinforcement Learning »
Daniele Calandriello · Alessandro Lazaric · Marcello Restelli -
2013 Poster: Adaptive Step-Size for Policy Gradient Methods »
Matteo Pirotta · Marcello Restelli · Luca Bascetta -
2011 Poster: Transfer from Multiple MDPs »
Alessandro Lazaric · Marcello Restelli -
2007 Spotlight: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini -
2007 Poster: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini