Timezone: »
Offline policy learning (OPL) leverages existing data collected a priori for policy optimization without any active exploration. Despite the prevalence and recent interest in this problem, its theoretical and algorithmic foundations in function approximation settings remain under-developed. In this paper, we consider this problem on the axes of distributional shift, optimization, and generalization in offline contextual bandits with neural networks. In particular, we propose a provably efficient offline contextual bandit with neural network function approximation that does not require any functional assumption on the reward. We show that our method provably generalizes over unseen contexts under a milder condition for distributional shift than the existing OPL works. Notably, unlike any other OPL method, our method learns from the offline data in an online manner using stochastic gradient descent, allowing us to leverage the benefits of online learning into an offline setting. Moreover, we show that our method is more computationally efficient and has a better dependence on the effective dimension of the neural network than an online counterpart. Finally, we demonstrate the empirical effectiveness of our method in a range of synthetic and real-world OPL problems
Author Information
Thanh Nguyen-Tang (Deakin University)
Sunil Gupta (Deakin University)
A. Tuan Nguyen (University of Oxford)
Svetha Venkatesh (Deakin University)
More from the Same Authors
-
2021 Poster: Model-Based Episodic Memory Induces Dynamic Hybrid Controls »
Hung Le · Thommen Karimpanal George · Majid Abdolshah · Truyen Tran · Svetha Venkatesh -
2021 Poster: Kernel Functional Optimisation »
Arun Kumar Anjanapura Venkatesh · Alistair Shilton · Santu Rana · Sunil Gupta · Svetha Venkatesh -
2021 Poster: Domain Invariant Representation Learning with Domain Density Transformations »
A. Tuan Nguyen · Toan Tran · Yarin Gal · Atilim Gunes Baydin -
2020 Poster: Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces »
Hung Tran-The · Sunil Gupta · Santu Rana · Huong Ha · Svetha Venkatesh -
2019 Poster: Bayesian Optimization with Unknown Search Space »
Huong Ha · Santu Rana · Sunil Gupta · Thanh Nguyen-Tang · Hung Tran-The · Svetha Venkatesh -
2019 Poster: Multi-objective Bayesian optimisation with preferences over objectives »
Majid Abdolshah · Alistair Shilton · Santu Rana · Sunil Gupta · Svetha Venkatesh -
2018 Poster: Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation »
Shivapratap Gopakumar · Sunil Gupta · Santu Rana · Vu Nguyen · Svetha Venkatesh -
2018 Poster: Variational Memory Encoder-Decoder »
Hung Le · Truyen Tran · Thin Nguyen · Svetha Venkatesh -
2017 Poster: Process-constrained batch Bayesian optimisation »
Pratibha Vellanki · Santu Rana · Sunil Gupta · David Rubin · Alessandra Sutti · Thomas Dorin · Murray Height · Paul Sanders · Svetha Venkatesh -
2017 Spotlight: Process-constrained batch Bayesian optimisation »
Pratibha Vellanki · Santu Rana · Sunil Gupta · David Rubin · Alessandra Sutti · Thomas Dorin · Murray Height · Paul Sanders · Svetha Venkatesh