Timezone: »
Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilties of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated. More specifically, we show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust. Leveraging this insight, we introduce a novel objective to train seemingly fair models where counterfactual explanations find much lower cost recourse under a slight perturbation. We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors. We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.
Author Information
Dylan Slack (UC Irvine)
Anna Hilgard (Harvard University)
Himabindu Lakkaraju (Stanford University)
Sameer Singh (University of California, Irvine)
More from the Same Authors
-
2021 : Defuse: Training More Robust Models through Creation and Correction of Novel Model Errors »
Dylan Slack · Krishnaram Kenthapadi -
2022 : TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations »
Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh -
2023 Poster: Post Hoc Explanations of Language Models Can Improve Language Models »
Satyapriya Krishna · Jiaqi Ma · Dylan Slack · Asma Ghandeharioun · Sameer Singh · Himabindu Lakkaraju -
2022 : Contributed Talk: TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations »
Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh -
2021 : [S6] Defuse: Training More Robust Models through Creation and Correction of Novel Model Errors »
Dylan Slack · Krishnaram Kenthapadi -
2021 Poster: Towards Robust and Reliable Algorithmic Recourse »
Sohini Upadhyay · Shalmali Joshi · Himabindu Lakkaraju -
2021 Poster: Reliable Post hoc Explanations: Modeling Uncertainty in Explainability »
Dylan Slack · Anna Hilgard · Sameer Singh · Himabindu Lakkaraju -
2021 Poster: Learning Models for Actionable Recourse »
Alexis Ross · Himabindu Lakkaraju · Osbert Bastani -
2020 Poster: From Predictions to Decisions: Using Lookahead Regularization »
Nir Rosenfeld · Anna Hilgard · Sai Srivatsa Ravindranath · David Parkes -
2019 : Poster session »
Jindong Gu · Alice Xiang · Atoosa Kasirzadeh · Zhiwei Han · Omar U. Florez · Frederik Harder · An-phi Nguyen · Amir Hossein Akhavan Rahnama · Michele Donini · Dylan Slack · Junaid Ali · Paramita Koley · Michiel Bakker · Anna Hilgard · Hailey James · Gonzalo Ramos · Jialin Lu · Jingying Yang · Margarita Boyarskaya · Martin Pawelczyk · Kacper Sokol · Mimansa Jaiswal · Umang Bhatt · David Alvarez-Melis · Aditya Grover · Charles Marx · Mengjiao (Sherry) Yang · Jingyan Wang · Gökhan Çapan · Hanchen Wang · Steffen Grünewälder · Moein Khajehnejad · Gourab Patro · Russell Kunes · Samuel Deng · Yuanting Liu · Luca Oneto · Mengze Li · Thomas Weber · Stefan Matthes · Duy Patrick Tu -
2016 Poster: Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making »
Himabindu Lakkaraju · Jure Leskovec