Timezone: »

On the Epistemic Limits of Personalized Prediction
Lucas Monteiro Paes · Carol Long · Berk Ustun · Flavio Calmon

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #316
Machine learning models are often personalized by using group attributes that encode personal characteristics (e.g., sex, age group, HIV status). In such settings, individuals expect to receive more accurate predictions in return for disclosing group attributes to the personalized model. We study when we can tell that a personalized model upholds this principle for every group who provides personal data. We introduce a metric called the benefit of personalization (BoP) to measure the smallest gain in accuracy that any group expects to receive from a personalized model. We describe how the BoP can be used to carry out basic routines to audit a personalized model, including: (i) hypothesis tests to check that a personalized model improves performance for every group; (ii) estimation procedures to bound the minimum gain in personalization. We characterize the reliability of these routines in a finite-sample regime and present minimax bounds on both the probability of error for BoP hypothesis tests and the mean-squared error of BoP estimates. Our results show that we can only claim that personalization improves performance for each group who provides data when we explicitly limit the number of group attributes used by a personalized model. In particular, we show that it is impossible to reliably verify that a personalized classifier with $k \geq 19$ binary group attributes will benefit every group who provides personal data using a dataset of $n = 8\times10^9$ samples -- one for each person in the world.

Author Information

Lucas Monteiro Paes (Harvard University)
Lucas Monteiro Paes

I am a second-year Applied Mathematics Ph.D. student in the School of Engineering and Applied Sciences (SEAS) at Harvard University, working with Prof. Flavio Calmon. My main research interests are fairness, information theory, and machine learning applications for the social good. Before joining Harvard, I received an M.s. in Computational Mathematics and Modelling from Instituto de Matemática Pura e Aplicada (IMPA) in Brazil. You can find my CV with a list of all my publications here.

Carol Long (Harvard University)
Berk Ustun (UC San Diego)
Berk Ustun

I am an Assistant Professor of Data Science and Computer Science at UCSD. My research develops methods to promote the responsible use of machine learning in medicine, consumer finance, and the physical sciences. In particular, my group focuses on issues associated with fairness, interpretability, robustness, and governance. Previously, I held research positions at Google and the Harvard Center for Research on Computation & Society. I also co-founded Petal, a fintech company that uses machine learning to broaden credit access in the US. To date, my research has received generous support from the NSF, NIH, and Amazon and has been recognized by the Kavli Fellowship and the INFORMS Innovative Applications in Analytics Award in 2016 and 2019.

Flavio Calmon (Harvard University)

More from the Same Authors