Timezone: »
As machine learning is deployed in all aspects of society, it has become increasingly important to ensure stakeholders understand and trust these models. Decision makers must have a clear understanding of the model behavior so they can diagnose errors and potential biases in these models, and decide when and how to employ them. However, most accurate models that are deployed in practice are not interpretable, making it difficult for users to understand where the predictions are coming from, and thus, difficult to trust.
Recent work on explanation techniques in machine learning offers an attractive solution: they provide intuitive explanations for “any” machine learning model by approximating complex machine learning models with simpler ones.
In this tutorial, we will discuss several post hoc explanation methods, and focus on their advantages and shortcomings. We will cover three families of techniques: (a) single instance gradient-based attribution methods (saliency maps), (b) model agnostic explanations via perturbations, such as LIME/SHAP and counterfactual explanations, and (c) surrogate modeling for global interpretability, such as MUSE. For each of these approaches, we will provide their problem setup, prominent methods, example applications, and finally, discuss their vulnerabilities and shortcomings. We will conclude the tutorial with an overview of future directions and a discussion on open research problems. We hope to provide a practical and insightful introduction to explainability in machine learning.
Author Information
Himabindu Lakkaraju (Harvard)
Hima Lakkaraju is an Assistant Professor at Harvard University focusing on explainability, fairness, and robustness of machine learning models. She has also been working with various domain experts in criminal justice and healthcare to understand the real world implications of explainable and fair ML. Hima has recently been named one of the 35 innovators under 35 by MIT Tech Review, and has received best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS. She has given invited workshop talks at ICML, NeurIPS, AAAI, and CVPR, and her research has also been covered by various popular media outlets including the New York Times, MIT Tech Review, TIME, and Forbes. For more information, please visit: https://himalakkaraju.github.io/
Julius Adebayo (MIT)
Julius Adebayo is a Ph.D. student at MIT working on developing and understanding approaches that seek to make machine learning-based systems reliable when deployed. More broadly, he is interested in rigorous approaches to help develop models that are robust to spurious associations, distribution shifts, and align with 'human' values. Website: https://juliusadebayo.com/
Sameer Singh (University of California, Irvine)
Sameer Singh is an Assistant Professor at UC Irvine working on robustness and interpretability of machine learning. Sameer has presented tutorials and invited workshop talks at EMNLP, Neurips, NAACL, WSDM, ICLR, ACL, and AAAI, and received paper awards at KDD 2016, ACL 2018, EMNLP 2019, AKBC 2020, and ACL 2020. Website: http://sameersingh.org/
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Tutorial: (Track2) Explaining Machine Learning Predictions: State-of-the-art, Challenges, and Opportunities Q&A »
Wed. Dec 9th 11:00 -- 11:50 AM Room
More from the Same Authors
-
2021 : Cutting Down on Prompts and Parameters:Simple Few-Shot Learning with Language Models »
Robert Logan · Ivana Balazevic · Eric Wallace · Fabio Petroni · Sameer Singh · Sebastian Riedel -
2022 : Quantifying Social Biases Using Templates is Unreliable »
Preethi Seshadri · Pouya Pezeshkpour · Sameer Singh -
2022 : TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations »
Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh -
2023 Poster: Post Hoc Explanations of Language Models Can Improve Language Models »
Satyapriya Krishna · Jiaqi Ma · Dylan Slack · Asma Ghandeharioun · Sameer Singh · Himabindu Lakkaraju -
2022 : Contributed Talk: TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations »
Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh -
2022 : A Human-Centric Take on Model Monitoring »
Murtuza Shergadwala · Himabindu Lakkaraju · Krishnaram Kenthapadi -
2022 : Invited talk (Dr Hima Lakkaraju) - "A Brief History of Explainable AI: From Simple Rules to Large Pretrained Models" »
Himabindu Lakkaraju -
2021 : Panel II: Machine decisions »
Anca Dragan · Karen Levy · Himabindu Lakkaraju · Ariel Rosenfeld · Maithra Raghu · Irene Y Chen -
2021 : Q/A Session »
Leilani Gilpin · Julius Adebayo -
2021 : [IT4] Detecting model reliance on spurious signals is challenging for post hoc explanation approaches »
Julius Adebayo -
2021 : Q/A Session »
Alexander Feldman · Himabindu Lakkaraju -
2021 : [IT3] Towards Reliable and Robust Model Explanations »
Himabindu Lakkaraju -
2021 : Cutting Down on Prompts and Parameters:Simple Few-Shot Learning with Language Models »
Robert Logan · Ivana Balazevic · Eric Wallace · Fabio Petroni · Sameer Singh · Sebastian Riedel -
2021 : Invited Talk: Towards Reliable and Robust Model Explanations »
Himabindu Lakkaraju -
2021 : PYLON: A PyTorch Framework for Learning with Constraints »
Kareem Ahmed · Tao Li · Nu Mai Thy Ton · Quan Guo · Kai-Wei Chang · Parisa Kordjamshidi · Vivek Srikumar · Guy Van den Broeck · Sameer Singh -
2020 Poster: Incorporating Interpretable Output Constraints in Bayesian Neural Networks »
Wanqian Yang · Lars Lorch · Moritz Graule · Himabindu Lakkaraju · Finale Doshi-Velez -
2020 Spotlight: Incorporating Interpretable Output Constraints in Bayesian Neural Networks »
Wanqian Yang · Lars Lorch · Moritz Graule · Himabindu Lakkaraju · Finale Doshi-Velez -
2020 Poster: Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses »
Kaivalya Rawal · Himabindu Lakkaraju -
2019 Workshop: KR2ML - Knowledge Representation and Reasoning Meets Machine Learning »
Veronika Thost · Christian Muise · Kartik Talamadupula · Sameer Singh · Christopher Ré -
2019 Demonstration: AllenNLP Interpret: Explaining Predictions of NLP Models »
Jens Tuyls · Eric Wallace · Matt Gardner · Junlin Wang · Sameer Singh · Sanjay Subramanian