Timezone: »
A latent bandit is a bandit problem where the learning agent knows reward distributions of arms conditioned on an unknown discrete latent state. The goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning, where complex models can be learned offline and the agent identifies the latent state online. This is of high practical relevance, for instance in recommender systems. In this work, we propose general algorithms for latent bandits, based on both upper confidence bounds and Thompson sampling. The algorithms are contextual, and aware of model uncertainty and misspecification. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions. A comprehensive empirical study showcases the advantages of our approach.
Author Information
Joey Hong (Google Research)
Branislav Kveton (Google Research)
Manzil Zaheer (Google)
Yinlam Chow (Google Research)
Amr Ahmed (Google Research)
Amr Ahmed is a Senior Staff Research Scientist at Google. He received his M.Sc and PhD degrees from the School of Computer Science, Carnegie Mellon University in 2009 and 2011, respectively. He received the best paper award at KDD 2014 , the best Paper Award at WSDM 2014, the 2012 ACM SIGKDD Doctoral Dissertation Award, and a best paper award (runner-up) at WSDM 2012. He co-chaired the WWW'18 track on Web Content Analysis and served as an Area Chair for IJCAI 2019, SIGIR 2019, SIGIR 2018, ICML 2018, ICML 2017, KDD 2016, WSDM 2015, ICML 2014, and ICDM 2014. His research interests include large-scale machine learning, data/web mining, user modeling, personalization, social networks and content analysis.
Craig Boutilier (Google)
More from the Same Authors
-
2020 Poster: PLLay: Efficient Topological Layer based on Persistent Landscapes »
Kwangho Kim · Jisu Kim · Manzil Zaheer · Joon Kim · Frederic Chazal · Larry Wasserman -
2020 Poster: Differentiable Meta-Learning of Bandit Policies »
Craig Boutilier · Chih-wei Hsu · Branislav Kveton · Martin Mladenov · Csaba Szepesvari · Manzil Zaheer -
2020 Poster: Robust large-margin learning in hyperbolic space »
Melanie Weber · Manzil Zaheer · Ankit Singh Rawat · Aditya Menon · Sanjiv Kumar -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Big Bird: Transformers for Longer Sequences »
Manzil Zaheer · Guru Guruganesh · Kumar Avinava Dubey · Joshua Ainslie · Chris Alberti · Santiago Ontanon · Philip Pham · Anirudh Ravula · Qifan Wang · Li Yang · Amr Ahmed -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 Workshop: Sets and Partitions »
Nicholas Monath · Manzil Zaheer · Andrew McCallum · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov -
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Spotlight: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2018 Poster: TopRank: A practical algorithm for online stochastic ranking »
Tor Lattimore · Branislav Kveton · Shuai Li · Csaba Szepesvari -
2018 Poster: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Poster: Nonparametric Density Estimation under Adversarial Losses »
Shashank Singh · Ananya Uppal · Boyue Li · Chun-Liang Li · Manzil Zaheer · Barnabas Poczos -
2018 Oral: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Poster: A Lyapunov-based Approach to Safe Reinforcement Learning »
Yinlam Chow · Ofir Nachum · Edgar Duenez-Guzman · Mohammad Ghavamzadeh -
2018 Poster: Adaptive Methods for Nonconvex Optimization »
Manzil Zaheer · Sashank Reddi · Devendra Sachan · Satyen Kale · Sanjiv Kumar -
2018 Poster: A Block Coordinate Ascent Algorithm for Mean-Variance Optimization »
Tengyang Xie · Bo Liu · Yangyang Xu · Mohammad Ghavamzadeh · Yinlam Chow · Daoming Lyu · Daesub Yoon -
2018 Poster: Data center cooling using model-predictive control »
Nevena Lazic · Craig Boutilier · Tyler Lu · Eehern Wong · Binz Roy · Moonkyung Ryu · Greg Imwalle -
2017 Oral: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback »
Zheng Wen · Branislav Kveton · Michal Valko · Sharan Vaswani -
2017 Poster: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2015 Poster: Efficient Thompson Sampling for Online Matrix-Factorization Recommendation »
Jaya Kawale · Hung H Bui · Branislav Kveton · Long Tran-Thanh · Sanjay Chawla -
2015 Poster: Combinatorial Cascading Bandits »
Branislav Kveton · Zheng Wen · Azin Ashkan · Csaba Szepesvari