Timezone: »
A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can only compare two or a few models due to budget constraints. We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments. We derive the probability distribution of the metric of interest that contains the model uncertainty from our Bayesian surrogate model trained using historical logs. Our method efficiently identifies the best model by sequentially selecting and deploying a list of models from the candidate set that balance exploration-exploitation. Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks.
Author Information
Zhenwen Dai (Spotify)
Praveen Chandar (Spotify)
Praveen Chandar is a Senior Research Scientist at Spotify working on search and recommendations. His research interests are in machine learning, information retrieval, and recommendation systems with a focus on experimentation and evaluation. Praveen received his Ph.D. from the University of Delaware, working on novelty and diversity aspects of search evaluation. He was previously a Research Staff Member at IBM Research. He has published papers at top conferences including, SIGIR, KDD, WSDM, WWW, CIKM, CHI, and UAI.
Ghazal Fazelnia (Spotify Research)
Benjamin Carterette (Spotify)
Mounia Lalmas (Spotify)
More from the Same Authors
-
2021 : Understanding User Podcast Consumption Using Sequential Treatment Effect Estimation »
Vishwali Mhasawade · Praveen Chandar · Ghazal Fazelnia · Benjamin Carterette -
2021 : Efficient Automated Online Experimentation with Multi-Fidelity »
Steven Kleinegesse · Zhenwen Dai · Andreas Damianou · Kamil Ciosek · Federico Tomasi -
2021 : Contrastive Embedding of Structured Space for Bayesian Optimization »
Josh Tingey · Ciarán Lee · Zhenwen Dai -
2021 : Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders »
Olivier Jeunen · Ciaran Gilligan-Lee · Rishabh Mehrotra · Mounia Lalmas -
2022 Poster: Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders »
Olivier Jeunen · Ciarán Gilligan-Lee · Rishabh Mehrotra · Mounia Lalmas -
2022 Spotlight: Lightning Talks 1A-3 »
Kimia Noorbakhsh · Ronan Perry · Qi Lyu · Jiawei Jiang · Christian Toth · Olivier Jeunen · Xin Liu · Yuan Cheng · Lei Li · Manuel Rodriguez · Julius von Kügelgen · Lars Lorch · Nicolas Donati · Lukas Burkhalter · Xiao Fu · Zhongdao Wang · Songtao Feng · Ciarán Gilligan-Lee · Rishabh Mehrotra · Fangcheng Fu · Jing Yang · Bernhard Schölkopf · Ya-Li Li · Christian Knoll · Maks Ovsjanikov · Andreas Krause · Shengjin Wang · Hong Zhang · Mounia Lalmas · Bolin Ding · Bo Du · Yingbin Liang · Franz Pernkopf · Robert Peharz · Anwar Hithnawi · Julius von Kügelgen · Bo Li · Ce Zhang -
2022 Spotlight: Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders »
Olivier Jeunen · Ciarán Gilligan-Lee · Rishabh Mehrotra · Mounia Lalmas -
2020 Tutorial: (Track2) Beyond Accuracy: Grounding Evaluation Metrics for Human-Machine Learning Systems Q&A »
Praveen Chandar · Fernando Diaz · Brian St. Thomas -
2020 Tutorial: (Track2) Beyond Accuracy: Grounding Evaluation Metrics for Human-Machine Learning Systems »
Praveen Chandar · Fernando Diaz · Brian St. Thomas