Timezone: »
The evaluation and optimization of machine learning systems have largely adopted well-known performance metrics like accuracy (for classification) or squared error (for regression). While these metrics are reusable across a variety of machine learning tasks, they make strong assumptions often not observed when situated in a broader technical or sociotechnical system. This is especially true in systems that interact with large populations of humans attempting to complete a goal or satisfy a need (e.g. search, recommendation, game-playing). In this tutorial, we will present methods for developing evaluation metrics grounded in what users expect of the system and how they respond to system decisions. The goal of this tutorial is both to share methods for designing user-based quantitative metrics and to motivate new research into optimizing for these more structured metrics.
Author Information
Praveen Chandar (Spotify)
Praveen Chandar is a Senior Research Scientist at Spotify working on search and recommendations. His research interests are in machine learning, information retrieval, and recommendation systems with a focus on experimentation and evaluation. Praveen received his Ph.D. from the University of Delaware, working on novelty and diversity aspects of search evaluation. He was previously a Research Staff Member at IBM Research. He has published papers at top conferences including, SIGIR, KDD, WSDM, WWW, CIKM, CHI, and UAI.
Fernando Diaz (Google)
Fernando Diaz is a research scientist at Google Brain Montréal. His research focuses on the design of information access systems, including search engines, music recommendation services and crisis response platforms is particularly interested in understanding and addressing the societal implications of artificial intelligence more generally. Previously, Fernando was the assistant managing director of Microsoft Research Montréal and a director of research at Spotify, where he helped establish its research organization on recommendation, search, and personalization. Fernando’s work has received awards at SIGIR, WSDM, ISCRAM, and ECIR. He is the recipient of the 2017 British Computer Society Karen Spärck Jones Award. Fernando has co-organized workshops and tutorials at SIGIR, WSDM, and WWW. He has also co-organized several NIST TREC initiatives, WSDM (2013), Strategic Workshop on Information Retrieval (2018), FAT* (2019), SIGIR (2021), and the CIFAR Workshop on Artificial Intelligence and the Curation of Culture (2019)
Brian St. Thomas (Spotify)
Brian St. Thomas is a Senior Data Scientist at Spotify researching online experimentation methods and metric development. His research interests are in the development and evaluation of personalized recommendation and search systems, with a focus on statistical aspects of these problems. Brian received his Ph.D. from Duke University, and was previously a Data Scientist with TiVo's Search and Recommendations division. Brian has published research in JASA, SIGIR, CHI, WWW and co-organized a tutorial at RecSys.
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Tutorial: (Track2) Beyond Accuracy: Grounding Evaluation Metrics for Human-Machine Learning Systems Q&A »
Tue. Dec 8th 10:00 -- 10:50 PM Room
More from the Same Authors
-
2021 : Artsheets for Art Datasets »
Ramya Srinivasan · Emily Denton · Jordan Famularo · Negar Rostamzadeh · Fernando Diaz · Beth Coleman -
2021 : Understanding User Podcast Consumption Using Sequential Treatment Effect Estimation »
Vishwali Mhasawade · Praveen Chandar · Ghazal Fazelnia · Benjamin Carterette -
2022 : Exposure Fairness in Music Recommendation »
Rebecca Salganik · Fernando Diaz · Golnoosh Farnadi -
2022 : Striving for data-model efficiency: Identifying data externalities on group performance »
Esther Rolf · Ben Packer · Alex Beutel · Fernando Diaz -
2022 Workshop: Cultures of AI and AI for Culture »
Alex Hanna · Rida Qadri · Fernando Diaz · Nick Seaver · Morgan Scheuerman -
2022 : Panel »
Hannah Korevaar · Manish Raghavan · Ashudeep Singh · Fernando Diaz · Chloé Bakalar · Alana Shine -
2020 Workshop: Algorithmic Fairness through the Lens of Causality and Interpretability »
Awa Dieng · Jessica Schrouff · Matt Kusner · Golnoosh Farnadi · Fernando Diaz -
2020 Poster: Model Selection for Production System via Automated Online Experiments »
Zhenwen Dai · Praveen Chandar · Ghazal Fazelnia · Benjamin Carterette · Mounia Lalmas -
2016 Demonstration: Project Malmo - Minecraft for AI Research »
Katja Hofmann · Matthew A Johnson · Fernando Diaz · Alekh Agarwal · Tim Hutton · David Bignell · Evelyne Viegas