Workshop
XAI in Action: Past, Present, and Future Applications
Chhavi Yadav · Michal Moshkovitz · Nave Frost · Suraj Srinivas · Bingqing Chen · Valentyn Boreiko · Himabindu Lakkaraju · J. Zico Kolter · Dotan Di Castro · Kamalika Chaudhuri
Room 271 - 273
Transparency is vital for AI’s growth. This led to the design of new methods inexplainable AI. We aim to explore the current state of applied XAI and identifyfuture directions.
Schedule
Sat 6:50 a.m. - 7:00 a.m.
|
Opening Remarks
(
Opening
)
>
SlidesLive Video |
🔗 |
Sat 7:00 a.m. - 7:30 a.m.
|
Explanations: Let's talk about them!
(
Talk
)
>
SlidesLive Video Posthoc explanations aim to give end-users insights and understanding into the workings of complex machine learning models. Despite their potential, posthoc explanations have found limited use in real-world applications and, for some evaluation setups, fail to help end-users achieve their tasks effectively. In a survey we carried out with domain experts to understand why they do not use explanation techniques, they pointed out that explanations are static and inflexible, making it challenging to explore the model behavior intuitively. Based on these insights, we propose a shift towards natural language conversations as a promising avenue for future work for explainability: they are easy to use, flexible, and interactive. We introduce an initial version of such a system, TalkToModel, that uses LLMs to enable open-ended natural language conversations for machine learning explainability. In our evaluation, TalkToModel can accurately identify diverse user intents and support various user queries. Further, users strongly prefer TalkToModel over existing explainability systems, demonstrating the effectiveness of natural language interfaces in supporting model understanding. (This is work with Dylan Slack, Satya Krishna, Hima Lakkaraju, Chenhao Tan, and Yuxin Chen) |
Sameer Singh 🔗 |
Sat 7:30 a.m. - 8:00 a.m.
|
Theoretical guarantees for explainable AI?
(
Talk
)
>
SlidesLive Video Explainable machine learning is often discussed as a tool to increase trust in machine learning systems. In my opinion, this can only work if the explanations are trustworthy themselves: we should be able to prove strong guarantees on the explanations provided. In my presentation I will argue that strong explanations in interesting scenarios might be difficult to achieve. |
Ulrike Luxburg 🔗 |
Sat 8:00 a.m. - 8:30 a.m.
|
Coffee & Games
(
Social
)
>
|
🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Explainable AI: where we are and how to move forward for health AI.
(
Talk
)
>
SlidesLive Video |
Su-In Lee 🔗 |
Sat 9:00 a.m. - 10:00 a.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video |
Leilani Gilpin · Shai Ben-David · Julius Adebayo · Sameer Singh · Su-In Lee · Kamalika Chaudhuri 🔗 |
Sat 10:00 a.m. - 11:30 a.m.
|
Lunch
(
Lunch
)
>
|
🔗 |
Sat 11:30 a.m. - 12:00 p.m.
|
Confronting the Faithfulness Challenge with Post-hoc Model Explanations.
(
Talk
)
>
SlidesLive Video Explaining the output of a trained deep neural network has emerged as a key research challenge. Several classes of explanation methods (feature attributions, training point ranking, post-hoc concept attribution) have been proposed to address that challenge. However, despite significant research contributions, evidence points to their ineffectiveness. In this talk, I'll highlight a key challenge that undercuts the effectiveness of current post hoc explanations methods: faithfulness. A model's explanation is faithful if the feature importance score, induced by the explanation, indicates the magnitude of the change in the model's output, when that feature is ablated. However, consistent evidence indicates that post hoc explanations of large-scale deep nets, under standard training regimes, are unfaithful. I'll close with two vignettes: the first on emerging recipes for overcoming the faithfulness challenge, and the second on an alternative paradigm that involves developing intrinsically interpretable models. |
Julius Adebayo 🔗 |
Sat 12:00 p.m. - 1:00 p.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
GInX-Eval: Towards In-Distribution Evaluation of Graph Neural Network Explanations
(
Poster
)
>
link
Diverse explainability methods of graph neural networks (GNN) have recently been developed to highlight the edges and nodes in the graph that contribute the most to the model predictions. However, it is not clear yet how to evaluate the correctness of those explanations, whether it is from a human or a model perspective. One unaddressed bottleneck in the current evaluation procedure is the problem of out-of-distribution explanations, whose distribution differs from those of the training data. This important issue affects existing evaluation metrics such as the popular faithfulness or fidelity score. In this paper, we show the limitations of faithfulness metrics. We propose GInX-Eval (Graph In-distribution eXplanation Evaluation), an evaluation procedure of graph explanations that overcomes the pitfalls of faithfulness and offers new insights on explainability methods. Using a retraining strategy, the GInX score measures how informative removed edges are for the model and the EdgeRank score evaluates if explanatory edges are correctly ordered by their importance. GInX-Eval verifies if ground-truth explanations are instructive to the GNN model. In addition, it shows that many popular methods, including gradient-based methods, produce explanations that are not better than a random designation of edges as important subgraphs, challenging the findings of current works in the area. Results with GInX-Eval are consistent across multiple datasets and align with human evaluation. |
Kenza Amara · Mennatallah El-Assady · Rex Ying 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
FRUNI and FTREE synthetic knowledge graphs for evaluating explainability
(
Poster
)
>
link
Research on knowledge graph completion (KGC)---i.e., link prediction within incomplete KGs---is witnessing significant growth in popularity. Recently, KGC using KG embedding (KGE) models, primarily based on complex architectures (e.g., transformers), have achieved remarkable performance. Still, extracting the \emph{minimal and relevant} information employed by KGE models to make predictions, while constituting a major part of \emph{explaining the predictions}, remains a challenge. While there exists a growing literature on explainers for trained KGE models, systematically exposing and quantifying their failure cases poses even greater challenges. In this work, we introduce two synthetic datasets, FRUNI and FTREE, designed to demonstrate the (in)ability of explainer methods to spot link predictions that rely on indirectly connected links. Notably, we empower practitioners to control various aspects of the datasets, such as noise levels and dataset size, enabling them to assess the performance of explainability methods across diverse scenarios. Through our experiments, we assess the performance of four recent explainers in providing accurate explanations for predictions on the proposed datasets. We believe that these datasets are valuable resources for further validating explainability methods within the knowledge graph community. |
Pablo Sanchez-Martin · Tarek R. Besold · Priyadarshini Kumari 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Explainable AI in Music Performance: Case Studies from Live Coding and Sound Spatialisation
(
Poster
)
>
link
Explainable Artificial Intelligence (XAI) has emerged as a significant area of research, with diverse applications across various fields. In the realm of arts, the application and implications of XAI remain largely unexplored. This paper investigates how artist-researchers address and navigate explainability in their systems during creative AI/ML practices, focusing on music performance. We present two case studies: live coding of AI/ML models and sound spatialisation performance. In the first case, we explore the inherent explainability in live coding and how the integration of interactive and on-the-fly machine learning processes can enhance this explainability. In the second case, we investigate how sound spatialisation can serve as a powerful tool for understanding and navigating the latent dimensions of autoencoders. Our autoethnographic reflections reveal the complexities and nuances of applying XAI in the arts, and underscore the need for further research in this area. We conclude that the exploration of XAI in the arts, particularly in music performance, opens up new avenues for understanding and improving the interaction between artists and AI/ML systems. This research contributes to the broader discussion on the diverse applications of XAI, with the ultimate goal of extending the frontiers of applied XAI. |
Jack Armitage · Nicola Privato · Victor Shepardson · Celeste Betancur Gutierrez 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Towards Explanatory Model Monitoring
(
Poster
)
>
link
Monitoring machine learning systems and efficiently recovering their reliability after performance degradation are two of the most critical issues in real-world applications. However, current monitoring strategies lack the capability to provide actionable insights answering the question of why the performance of a particular model really degraded. To address this, we propose Explanatory Performance Estimation (XPE) as a novel method that facilitates more informed model monitoring and maintenance by attributing an estimated performance change to interpretable input features. We demonstrate the superiority of our approach compared to natural baselines on different data sets. We also discuss how the generated results lead to valuable insights that can reveal potential root causes for model deterioration and guide toward actionable countermeasures. |
Alexander Koebler · Thomas Decker · Michael Lebacher · Ingo Thon · Volker Tresp · Florian Buettner 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Lessons from Usable ML Deployments Applied to Wind Turbine Monitoring
(
Poster
)
>
link
Through past experiences deploying what we call usable ML (one step beyond explainable ML, including both explanations and other augmenting information) to real-world domains, we have learned three key lessons. First, many organizations are beginning to add people who we call "bridges" because they bridge the gap between ML developers and domain-experts, and these people fill a valuable role in developing usable ML applications. Second, a configurable system that enables easily iterating on usable ML interfaces during collaborations with bridges is key. Finally, there is a need for continuous, in-deployment evaluations to quantify the real-world impact of usable ML. Throughout this paper, we apply these lessons to the task of wind turbine monitoring, an essential task in the renewable energy domain. Turbine engineers and data analysts must decide whether to perform costly in-person investigations on turbines to prevent potential cases of brakepad failure, and well-tuned usable ML interfaces can aid with this decision-making process. Through the applications of our lessons to this task, we hope to demonstrate the potential real-world impact of usable ML in the renewable energy domain. |
Alexandra Zytek · Wei-En Wang · Sofia Koukoura · Kalyan Veeramachaneni 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
(
Poster
)
>
link
As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking.To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models' MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques readily available for LLMs. The easy-to-use interface also makes inspecting these complex models more intuitive.This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior. For example, we contrast DeepDecipher's functionality with similar tools like Neuroscope and OpenAI's Neuron Explainer.DeepDecipher enables efficient, scalable analysis of LLMs. By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe. Researchers, engineers, and developers can quickly diagnose issues, audit systems, and advance the field. |
Albert Garde · Esben Kran · Fazl Barez 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?
(
Poster
)
>
link
Surging interest in deep learning from high-stakes domains has precipitated concern over the inscrutable nature of black box neural networks. Explainable AI (XAI) research has led to an abundance of explanation algorithms for these black boxes. Such post hoc explainers produce human-comprehensible explanations, however, their fidelity with respect to the model is not well understood - explanation evaluation remains one of the most challenging issues in XAI. In this paper, we ask a targeted but important question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors? Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model. We demonstrate the efficacy of our approach in understanding these explainers applied to symbolic expressions, neural networks, and generalized additive models on thousands of synthetic and several real-world tasks. Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions. |
Zachariah Carmichael · Walter Scheirer 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making
(
Poster
)
>
link
Pre-trained transformers are often fine-tuned to aid clinical decision-making using limited clinical notes. Model interpretability is crucial, especially in high-stakes domains like medicine, to establish trust and ensure safety, which requires human engagement. We introduce SUFO, a systematic framework that enhances interpretability of fine-tuned transformer feature spaces. SUFO utilizes a range of analytic and visualization techniques, including Supervised probing, Unsupervised similarity analysis, Feature dynamics, and Outlier analysis to address key questions about model trust and interpretability.We conduct a case study investigating the impact of pre-training data where we focus on real-world pathology classification tasks, and validate our findings on MedNLI. We evaluate five 110M-sized pre-trained transformer models, categorized into general-domain (BERT, TNLR), mixed-domain (BioBERT, Clinical BioBERT), and domain-specific (PubMedBERT) groups.Our SUFO analyses reveal that: (1) while PubMedBERT, the domain-specific model, contains valuable information for fine-tuning, it can overfit to minority classes when class imbalances exist. In contrast, mixed-domain models exhibit greater resistance to overfitting, suggesting potential improvements in domain-specific model robustness; (2) in-domain pre-training accelerates feature disambiguation during fine-tuning; and (3) feature spaces undergo significant sparsification during this process, enabling clinicians to identify common outlier modes among fine-tuned models as demonstrated in this paper. These findings showcase the utility of SUFO in enhancing trust and safety when using transformers in medicine, and we believe SUFO can aid practitioners in evaluating fine-tuned language models for other applications in medicine and in more critical domains. |
Aliyah Hsu · Yeshwanth Cherapanamjeri · Briton Park · Tristan Naumann · Anobel Odisho · Bin Yu 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments
(
Poster
)
>
link
Feature attribution explains neural network outputs by identifying relevant input features.How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model.One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data.Subsequently, the identified features are evaluated by comparing them with these designed ground truth features.However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way.In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements. |
Yang Zhang · Yawei Li · Hannah Brown · Mina Rezaei · Bernd Bischl · Philip Torr · Ashkan Khakzar · Kenji Kawaguchi 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Geometric Remove-and-Retrain (GOAR): Coordinate-Invariant eXplainable AI Assessment
(
Poster
)
>
link
Identifying the relevant input features that have a critical influence on the output results is indispensable for the development of explainable artificial intelligence (XAI). Remove-and-Retrain (ROAR) is a widely accepted approach for assessing the importance of individual pixels by measuring changes in accuracy following their removal and subsequent retraining of the modified dataset. However, we uncover notable limitations in pixel-perturbation strategies. When viewed from a geometric perspective, this method perturbs pixels by moving each sample in the pixel-basis direction. However, we have found that this approach is coordinate-dependent and fails to discriminate between differences among features, thereby compromising the reliability of the evaluation. To address this challenge, we introduce an alternative feature-perturbation approach named Geometric Remove-and-Retrain (GOAR). GOAR offers a perturbation strategy that takes into account the geometric structure of the dataset, providing a coordinate-independent metric for accurate feature comparison. Through a series of experiments with both synthetic and real datasets, we substantiate that GOAR's geometric metric transcends the limitations of pixel-centric metrics. |
Yong-Hyun Park · Junghoon Seo · 범석 박 · Seongsu Lee · Junghyo Jo 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Utilizing Explainability Techniques for Reinforcement Learning Model Assurance
(
Demo
)
>
link
Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Reinforcement Learning (RL) model and increase user trust and adoption into real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained RL model prior to deployment, therefore limiting the potential for mission failure or mistakes by the system. This paper introduces the ARLIN (Assured RL Model Interrogation) Toolkit, a Python library that provides explainability outputs for trained RL models that can be used to identify potential policy vulnerabilities and critical points. Using XRL datasets, ARLIN provides detailed analysis into an RL model's latent space, creates a semi-aggregated Markov decision process (SAMDP) to outline the model's path throughout an episode, and produces cluster analytics for each node within the SAMDP to identify potential failure points and vulnerabilities within the model. To illustrate ARLIN's effectiveness, we provide sample API usage and corresponding explainability visualizations and vulnerability point detection for a publicly available RL model. The open-source code repository is available for download at (GitHub link forthcoming). |
Alexander Tapley 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Detecting Spurious Correlations via Robust Visual Concepts in Real and AI-Generated Image Classification
(
Poster
)
>
link
Often machine learning models tend to automatically learn associations present in the training data without questioning their validity or appropriateness. This undesirable property is the root cause of the manifestation of spurious correlations, which render models unreliable and prone to failure in the presence of distribution shifts. Research shows that most methods attempting to remedy spurious correlations are only effective for a model's known spurious associations. Current spurious correlation detection algorithms either rely on extensive human annotations or are too restrictive in their formulation. Moreover, they rely on strict definitions of visual artifacts that may not apply to data produced by generative models, as they are known to hallucinate contents that do not conform to standard specifications. In this work, we introduce a general-purpose method that efficiently detects potential spurious correlations, and requires significantly less human interference in comparison to the prior art. Additionally, the proposed method provides intuitive explanations while eliminating the need for pixel-level annotations. We demonstrate the proposed method's tolerance to the peculiarity of AI-generated images, which is a considerably challenging task, one where most of the existing methods fall short. Consequently, our method is also suitable for detecting spurious correlations that may propagate to downstream applications originating from generative models. |
Preetam Prabhu Srikar Dammu · Chirag Shah 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Towards the next generation explainable AI that promotes AI-human mutual understanding
(
Poster
)
>
link
Recent advances in deep learning AI has demanded better explanations on AI’s operations to enhance transparency of AI’s decisions, especially in critical systems such as self-driving car or medical diagnosis applications, to ensure safety, user trust and user satisfaction. However, current Explainable AI (XAI) solutions focus on using more AI to explain AI, without considering users’ mental processes. Here we use cognitive science theories and methodologies to develop a next-generation XAI framework that promotes human-AI mutual understanding, using computer vision AI models as examples due to its importance in critical systems. Specifically, we propose to equip XAI with an important cognitive capacity in human social interaction: theory of mind (ToM), i.e., the capacity to understand others’ behaviour by attributing mental states to them. We focus on two ToM abilities: (1) Inferring human strategy and performance (i.e., Machine’s ToM), and (2) Inferring human understanding of AI strategy and trust towards AI (i.e., to infer Human’s ToM). Computational modeling of human cognition and experimental psychology methods play an important role for XAI to develop these two ToM abilities to provide user-centered explanations through comparing users' strategy with AI’s strategy and estimating user’s current understanding of AI’s strategy, similar to real-life teachers. Enhanced human-AI mutual understanding can in turn lead to better adoption and trust of AI systems. This framework thus highlights the importance of cognitive science approaches to XAI. |
Janet Hsiao · Antoni Chan 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Are VideoQA Models Truly Multimodal?
(
Poster
)
>
link
While VideoQA Transformer models demonstrate competitive performance on standard benchmarks, the reasons behind their success are not fully understood. Do these models jointly capture and leverage the rich multimodal structures and dynamics from video and text? Or are they merely exploiting shortcuts to achieve high scores? Hence, we design $\textit{QUAG}$ (QUadrant AveraGe), a lightweight and non-parametric probe, to critically analyze multimodal representations. QUAG facilitates combined dataset-model study by systematic ablation of model's coupled multimodal understanding during inference. Surprisingly, it demonstrates that the models manage to maintain high performance even under multimodal impairment. This indicates that the current VideoQA benchmarks and metrics do not penalize models that find shortcuts and discount joint multimodal understanding. Motivated by this, we propose $\textit{CLAVI}$ (Counterfactual in LAnguage and VIdeo), a diagnostic dataset for coupled multimodal understanding in VideoQA. CLAVI consists of temporal questions and videos that are augmented to curate balanced counterfactuals in language and video domains. We evaluate models on CLAVI and find that all models achieve high performance on multimodal shortcut instances, but most of them have very poor performance on the counterfactual instances that necessitate joint multimodal understanding. Overall, we show that many VideoQA models are incapable of learning multimodal representations and that their success on standard datasets is an illusion of joint multimodal understanding.
|
Ishaan Singh Rawal · Shantanu Jaiswal · Basura Fernando · Cheston Tan 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
(
Poster
)
>
link
Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust.We argue for the use of policies that are piecewise-linear. We carefully study to what extent they can retain the interpretable properties of linear policies while performing competitively with neural baselines.In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the agent's decision process without needing an additional explanation model.We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance. |
Maxime Wabartha · Joelle Pineau 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
COMET: Cost Model Explanation Framework
(
Poster
)
>
link
ML-based program cost models have been shown to yield fairly accurate program cost predictions. They can replace heavily engineered analytical program cost models in mainstream compiler workflows, but their black-box nature discourages their adoption. In this work, we develop the first framework, COMET, for generating faithful, generalizable, and intuitive explanations for x86 cost models, such as the ML cost model Ithemal. We generate and compare COMET’s explanations for Ithemal against those for an accurate analytical cost model, uiCA. Our empirical findings show an inverse correlation between the prediction error of a cost model and the semantic richness of COMET’s explanations for the cost model, thus indicating potential sources of higher error of Ithemal with respect to uiCA. |
Isha Chaudhary · Alex Renda · Charith Mendis · Gagandeep Singh 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Interactive Visual Feature Search
(
Demo
)
>
link
Many visualization techniques have been created to explain the behavior of computer vision models, but they largely consist of static diagrams that convey limited information. Interactive visualizations allow users to more easily explore a model's behavior, but most are not easily reusable for new models. We introduce Visual Feature Search, a novel interactive visualization that is adaptable to any CNN and can easily be incorporated into a researcher's workflow. Our tool allows a user to highlight an image region and search for images from a given dataset with the most similar model features. We demonstrate how our tool elucidates different aspects of model behavior by performing experiments on a range of applications, such as in medical imaging and wildlife classification. We plan to open-source our tool to enable others to interpret their own models. |
Devon Ulrich · Ruth Fong 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
(
Poster
)
>
link
Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs. |
Ričards Marcinkevičs · Sonia Laguna · Moritz Vandenhirtz · Julia Vogt 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Estimation of Concept Explanations Should be Uncertainty Aware
(
Poster
)
>
link
Model explanations are very valuable for interpreting and debugging prediction models. We study a specific kind of global explanations called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Recent advances in multi-modal learning rekindled interest in concept explanations and led to several label-efficient proposals for estimation. However, existing estimation methods are unstable to the choice of concepts or dataset that is used for computing explanations. We observe that instability in explanations is because estimations do not model noise. We propose an uncertainty aware estimation method, which readily improved reliability of the concept explanations. We demonstrate with theoretical analysis and empirical evaluation that explanations computed by our method are stable to the choice of concepts and data shifts while also being label-efficient and faithful. |
Vihari Piratla · Juyeon Heo · Sukriti Singh · Adrian Weller 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Optimising Human-AI Collaboration by Learning Convincing Explanations
(
Poster
)
>
link
Machine learning models are being increasingly deployed to take, or assist in taking, complicated and high-impact decisions, from quasi-autonomous vehicles to clinical decision support systems. This poses challenges, particularly when models have hard-to-detect failure modes and are able to take actions without oversight. In order to handle this challenge, we propose a method for a collaborative system that remains safe by having a human ultimately making decisions, while giving the model the best opportunity to convince and debate them with interpretable explanations. However, the most helpful explanation varies among individuals and may be inconsistent across stated preferences. To this end we develop an algorithm, Ardent, to efficiently learn a ranking through interaction and best assist humans complete a task. By utilising a collaborative approach, we can ensure safety and improve performance while addressing transparency and accountability concerns. Ardent enables efficient and effective decision-making by adapting to individual preferences for explanations, which we validate through extensive simulations alongside a user study involving a challenging image classification task, demonstrating consistent improvement over competing systems. |
Alex Chan · Alihan Hüyük · Mihaela van der Schaar 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Inherent Inconsistencies of Feature Importance
(
Poster
)
>
link
The rapid advancement and widespread adoption of machine learning-driven technologies have underscored the practical and ethical need for creating interpretable artificial intelligence systems. Feature importance, a method that assigns scores to the contribution of individual features on prediction outcomes, seeks to bridge this gap as a tool for enhancing human comprehension of these systems. Feature importance serves as an explanation of predictions in diverse contexts, whether by providing a global interpretation of a phenomenon across the entire dataset or by offering a localized explanation for the outcome of a specific data point. Furthermore, feature importance is being used both for explaining models and for identifying plausible causal relations in the data, independently from the model. However, it is worth noting that these various contexts have traditionally been explored in isolation, with limited theoretical foundations.This paper presents an axiomatic framework designed to establish coherent relationships among the different contexts of feature importance scores. Notably, our work unveils a surprising conclusion: when we combine the proposed properties with those previously outlined in the literature, we demonstrate the existence of an inconsistency. This inconsistency highlights that certain essential properties of feature importance scores cannot coexist harmoniously within a single framework. |
Nimrod Harel · Uri Obolski · Ran Gilad-Bachrach 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
ExpLIMEable: An exploratory framework for LIME
(
Demo
)
>
link
ExpLIMEable is a tool to enhance the comprehension of Local Interpretable Model-Agnostic Explanations (LIME), particularly within the realm of medical image analysis. LIME explanations often lack robustness due to variances in perturbation techniques and interpretable function choices. Powered by a convolutional neural network for brain MRI tumor classification, \textit{ExpLIMEable} seeks to mitigate these issues. This explainability tool allows users to tailor and explore the explanation space generated post hoc by different LIME parameters to gain deeper insights into the model's decision-making process, its sensitivity, and limitations. We introduce a novel dimension reduction step on the perturbations seeking to find more informative neighborhood spaces and extensive provenance tracking to support the user. This contribution ultimately aims to enhance the robustness of explanations, key in high-risk domains like healthcare. |
Sonia Laguna · Julian Heidenreich · Jiugeng Sun · Nilüfer Cetin · Ibrahim Al Hazwani · Udo Schlegel · Furui Cheng · Mennatallah El-Assady 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Influence Based Approaches to Algorithmic Fairness: A Closer Look
(
Poster
)
>
link
In contemporary machine learning, there's a growing trend of utilizing ready-made pre-trained models. In real-world applications, it is essential that the pre-trained models are not just accurate but also demonstrate qualities like fairness. This paper takes a closer look at recently proposed approaches that re-weight the training data to edit a pre-trained model for group fairness. We offer perspectives that unify disparate weighting schemes from past studies and pave the way for new weighting strategies to address group fairness concerns. |
Soumya Ghosh · Prasanna Sattigeri · Inkit Padhi · Manish Nagireddy · Jie Chen 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Explaining black box text modules in natural language with language models
(
Poster
)
>
link
Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A text module is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. Black box indicates that we only have access to the module's inputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation. We study SASC in 2 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model's internals. |
Chandan Singh · Aliyah Hsu · Richard Antonello · Shailee Jain · Alexander Huth · Bin Yu · Jianfeng Gao 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Use Perturbations when Learning from Explanations
(
Poster
)
>
link
Machine learning from explanations (MLX) is an approach to learning that uses human-provided explanations of relevant or irrelevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely on local model interpretation methods and require strong model smoothing to align model and human explanations, leading to sub-optimal performance. We recast MLX as a robustness problem, where human explanations specify a lower dimensional manifold from which perturbations can be drawn, and show both theoretically and empirically how this approach alleviates the need for strong model smoothing. We consider various approaches to achieving robustness, leading to improved performance over prior MLX methods. Finally, we show how to combine robustness with an earlier MLX method, yielding state-of-the-art results on both synthetic and real-world benchmarks. |
Juyeon Heo · Vihari Piratla · Matthew Wicker · Adrian Weller 🔗 |
Sat 12:01 p.m. - 1:00 p.m.
|
Assessment of the Reliablity of a Model's Decision by Generalizing Attribution to the Wavelet Domain
(
Poster
)
>
link
Neural networks have shown remarkable performance in computer vision, but their deployment in numerous scientific and technical fields is challenging due to their black-box nature. Scientists and practitioners need to evaluate the reliability of a decision, i.e., to know simultaneously if a model relies on the relevant features and whether these features are robust to image corruptions. Existing attribution methods aim to provide human-understandable explanations by highlighting important regions in the image domain, but fail to fully characterize a decision process's reliability. To bridge this gap, we introduce the Wavelet sCale Attribution Method (WCAM), a generalization of attribution from the pixel domain to the space-scale domain using wavelet transforms. Attribution in the wavelet domain reveals where and on what scales the model focuses, thus enabling us to assess whether a decision is reliable. |
Gabriel Kasmi · Laurent Dubus · Yves-Marie Saint-Drenan · Philippe BLANC 🔗 |
Sat 1:00 p.m. - 1:30 p.m.
|
Coffee & Games
(
Social
)
>
|
🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Explaining Self-Driving Cars for Accountable Autonomy.
(
Talk
)
>
SlidesLive Video Autonomous systems are prone to errors and failures without knowing why. In critical domains like driving, these autonomous counterparts must be able to recount their actions for safety, accountability, and trust. An explanation: a model-dependent reason or justification for the decision of the autonomous agent being assessed, is a key component for post-mortem failure analysis, but also for pre-deployment verification. I will present neuro-symbolic systems that use neural networks and commonsense knowledge to detect and explain unreasonable vehicle scenarios, even if the autonomous vehicle has not seen that error before. In the second part of the talk, I will motivate the use of explanations as a testing framework for autonomous systems. I will conclude by discussing new challenges at the intersection of explainable AI and autonomy toward autonomous vehicles systems that are explainable by design. |
Leilani Gilpin 🔗 |
Sat 2:00 p.m. - 2:30 p.m.
|
Contributed Talks
(
Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:01 p.m. - 2:07 p.m.
|
Emergence of Segmentation with Minimalistic White-Box Transformers
(
Spotlight
)
>
link
Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection. Previous works have shown that segmentation properties emerge in vision transformers (ViTs) trained using self-supervised methods such as DINO, but not in those trained on supervised classification tasks. In this study, we probe whether segmentation emerges in transformer-based models \textit{solely} as a result of intricate self-supervised learning mechanisms, or if the same emergence can be achieved under much broader conditions through proper design of the model architecture. Through extensive experimental results, we demonstrate that when employing a white-box transformer-like architecture known as \ours{}, whose design explicitly models and pursues low-dimensional structures in the data distribution, segmentation properties, at both the whole and parts levels, already emerge with a minimalistic supervised training recipe. Layer-wise finer-grained analysis reveals that the emergent properties strongly corroborate the designed mathematical functions of the white-box network. Our results suggest a path to design white-box foundation models that are simultaneously highly performant and mathematically fully interpretable. |
Yaodong Yu · Tianzhe Chu · Shengbang Tong · Ziyang Wu · Druv Pai · Sam Buchanan · Yi Ma 🔗 |
Sat 2:07 p.m. - 2:14 p.m.
|
Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
(
Spotlight
)
>
link
In light of the recent widespread adoption of AI systems, understanding the internal information processing of neural networks has become increasingly critical. Most recently, machine vision has seen remarkable progress by scaling neural networks to unprecedented levels in dataset and model size. We here ask whether this extraordinary increase in scale also positively impacts the field of mechanistic interpretability. In other words, has our understanding of the inner workings of scaled neural networks improved as well? We use a psychophysical paradigm to quantify one form of mechanistic interpretability for a diverse suite of models and find no scaling effect for interpretability - neither for model nor dataset size. Specifically, none of the nine investigated state-of-the-art models are easier to interpret than GoogLeNet from almost a decade ago. Latest-generation vision models appear even less interpretable than older architectures, hinting at a regression rather than improvement, with modern models sacrificing interpretability for accuracy. These results highlight the need for models explicitly designed to be mechanistically interpretable and the need for more helpful interpretability methods to increase our understanding of networks at an atomic level. We release a dataset of more than 130'000 human responses from our psychophysical evaluation of 767 units across nine models. This dataset is meant to facilitate research on automated instead of human-based interpretability evaluations that can ultimately be leveraged to directly optimize the mechanistic interpretability of models. |
Roland S. Zimmermann · Thomas Klein · Wieland Brendel 🔗 |
Sat 2:14 p.m. - 2:21 p.m.
|
On Evaluating Explanation Utility for Human-AI Decision-Making in NLP
(
Spotlight
)
>
link
Is explainability a false promise? This debate has emerged from the lack of consistent evidence that explanations help in situations they are introduced for. In NLP, the evidence is not only inconsistent but also scarce. While there is a clear need for more human-centered, application-grounded evaluations, it is less clear where NLP researchers should begin if they want to conduct them. To address this, we introduce evaluation guidelines established through an extensive review and meta-analysis of related work. |
Fateme Hashemi Chaleshtori · Atreya Ghosal · Ana Marasovic 🔗 |
Sat 2:21 p.m. - 2:28 p.m.
|
Understanding Scalable Perovskite Solar Cell Manufacturing with Explainable AI
(
Spotlight
)
>
link
Large-area processing of perovskite semiconductor thin-films is complex and evokes unexplained variance in quality, posing a major hurdle for the commercialization of perovskite photovoltaics. Advances in scalable fabrication processes are currently limited to gradual and arbitrary trial-and-error procedures. While the in-situ acquisition of photoluminescence videos has the potential to reveal important variations in the thin-film formation process, the high dimensionality of the data quickly surpasses the limits of human analysis. In response, this study leverages deep learning and explainable artificial intelligence (XAI) to discover relationships between sensor information acquired during the perovskite thin-film formation process and the resulting solar cell performance indicators, while rendering these relationships humanly understandable. Through a diverse set of XAI methods, we explain not only what characteristics are important but also why, allowing material scientists to translate findings into actionable conclusions. Our study demonstrates that XAI methods will play a critical role in accelerating energy materials science. |
Lukas Klein · Sebastian Ziegler · Felix Laufer · Charlotte Debus · Markus Götz · Klaus Maier-Hein · Ulrich Paetzold · Fabian Isensee · Paul Jaeger 🔗 |
Sat 2:30 p.m. - 3:30 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
-
|
A Critical Survey on Fairness Benefits of XAI
(
Poster
)
>
link
In this critical survey, we analyze typical claims on the relationship between XAI and fairness to disentangle the multidimensional relationship between these two concepts.Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 papers on the alleged fairness benefits of XAI.We present crucial caveats with respect to these claims that emerged from our literature review and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata.While XAI appears to be applicable across several fairness desiderata, we notice a misalignment between these fairness desiderata and the capabilities of XAI.We encourage to conceive XAI as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness and to be more specific about \textit{how} exactly \textit{what} kind of XAI method enables \textit{whom} to address \textit{which} fairness dimension. |
Luca Deck · Jakob Schoeffer · Maria De-Arteaga · Niklas Kuehl 🔗 |
-
|
Exploring Practitioner Perspectives On Training Data Attribution Explanations
(
Poster
)
>
link
Explainable AI (XAI) aims to provide insight into opaque model reasoning to humans and as such is an interdisciplinary field by nature. In this paper, we interviewed 10 practitioners to understand the possible usability of training data attribution (TDA) explanations and to explore the design space of such an approach. We confirmed that training data quality is often the most important factor for high model performance in practice and model developers mainly rely on their own experience to curate data. End-users expect explanations to enhance their interaction with the model and do not necessarily prioritise but are open to training data as a means of explanation. Within our participants, we found that TDA explanations are not well-known and therefore not used. We urge the community to focus on the utility of TDA techniques from the human-machine collaboration perspective and broaden the TDA evaluation to reflect common use cases in practice. |
Elisa Nguyen · Evgenii Kortukov · Jean Song · Seong Joon Oh 🔗 |
-
|
Explaining high-dimensional text classifiers
(
Poster
)
>
link
Explainability has become a valuable tool in the last few years, helping humans better understand AI-guided decisions. However, the classic explainability tools are sometimes quite limited when considering high-dimensional inputs and neural network classifiers. We present a new explainability method using theoretically proven high dimensional properties in neural network classifiers. We present two usages of it: 1) On the classical sentiment analysis task for the IMDB reviews dataset, and 2) our Malware-Detection task for our PowerShell scripts dataset. |
Odelia Melamed · Rich Caruana 🔗 |
-
|
Sum-of-Parts Models: Faithful Attributions for Groups of Features
(
Poster
)
>
link
An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process. However, explanations such as feature attributions for deep learning are not guaranteed to be faithful, and can produce potentially misleading interpretations. In this work, we develop Sum-of-Parts (SOP), a class of models whose predictions come with grouped feature attributions that are faithful-by-construction. This model decomposes a prediction into an interpretable sum of scores, each of which is directly attributable to a sparse group of features. We evaluate SOP on benchmarks with standard interpretability metrics, and in a case study, we use the faithful explanations from SOP to help astrophysicists discover new knowledge about galaxy formation. |
Weiqiu You · Helen Qu · Marco Gatti · Bhuvnesh Jain · Eric Wong 🔗 |
-
|
Sanity Checks Revisited: An Exploration to Repair the Model Parameter Randomisation Test
(
Poster
)
>
link
The Model Parameter Randomisation Test (MPRT) is widely acknowledged in the eXplainable Artificial Intelligence (XAI) community for its well-motivated evaluative principle: that the explanation function should be sensitive to changes in the parameters of the model function. However, recent works have identified several methodological caveats for the empirical interpretation of MPRT. In this work, we introduce two adaptations to the original MPRT---Smooth MPRT and Efficient MPRT, where the former minimises the impact that noise has on the evaluation results and the latter circumvents the need for biased similarity measurements by re-interpreting the test through the explanation's rise in complexity, post-model randomisation. Our experimental results demonstrate improved metric reliability, for more trustworthy applications of XAI methods. |
Anna Hedström · Leander Weber · Sebastian Lapuschkin · Marina Höhne 🔗 |
-
|
Stability Guarantees for Feature Attributions with Multiplicative Smoothing
(
Poster
)
>
link
Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that a relaxed variant of stability is guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes the theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees. |
Anton Xue · Rajeev Alur · Eric Wong 🔗 |
-
|
On the Consistency of GNN Explainability Methods
(
Poster
)
>
link
Despite the widespread utilization of post-hoc explanation methods for graph neural networks (GNNs) in high-stakes settings, there has been a lack of comprehensive evaluation regarding their quality and reliability. This evaluation is challenging primarily due to the data's non-Euclidean nature, arbitrary size, and complex topological structure. In this context, we argue that the consistency of GNN explanations, denoting the ability to produce similar explanations for input graphs with minor structural changes that do not alter their output predictions, is a key requirement for effective post-hoc GNN explanations. To fulfill this gap, we introduce a novel metric based on Fused Gromov--Wasserstein distance to quantify consistency. Finally, we demonstrate that current methods do not perform well according to this metric, underscoring the need for further research on reliable GNN explainability methods. |
Ehsan Hajiramezanali · Sepideh Maleki · Alex Tseng · Aicha BenTaieb · Gabriele Scalia · Tommaso Biancalani 🔗 |
-
|
Transparent Anomaly Detection via Concept-based Explanations
(
Poster
)
>
link
Advancements in deep learning techniques have given a boost to the performance of anomaly detection. However, real-world and safety-critical applications demand a level of transparency and reasoning beyond accuracy. The task of anomaly detection (AD) focuses on finding whether a given sample follows the learned distribution. Existing methods lack the ability to reason with clear explanations for their outcomes. Hence to overcome this challenge, we propose Transparent \textbf{A}nomaly Detection \textbf{C}oncept \textbf{E}xplanations (ACE). ACE is able to provide human interpretable explanations in the form of concepts along with anomaly prediction. To the best of our knowledge, this is the first paper that proposes interpretable by-design anomaly detection. In addition to promoting transparency in AD, it allows for effective human-model interaction. Our proposed model shows either higher or comparable results to black-box uninterpretable models. We validate the performance of ACE across three realistic datasets- challenging histopathology slide image classification on TIL-WSI-TCGA, bird classification on CUB-200-2011 and gender classification on CelebA. We further demonstrate that our concept learning paradigm can be seamlessly integrated with other classification-based AD methods. |
Laya Rafiee Sevyeri · Ivaxi Sheth · Farhood Farahnak · Shirin Abbasinejad Enger 🔗 |
-
|
Robust Recourse for Binary Allocation Problems
(
Poster
)
>
link
We present the problem of algorithmic recourse for the setting of binary allocation problems. In this setting, the optimal allocation does not depend only on the prediction model and the individual's features, but also on the current available resources, decision maker's objective and other individuals currently applying for the resource.Specifically, we focus on 0-1 knapsack problems and in particular the use case of lending. We first provide a method for generating counterfactual explanations and then address the problem of recourse invalidation due to changes in allocation variables. Finally, we empirically compare our method with perturbation-robust recourse and show that our method can provide higher validity at a lower cost. |
Meirav Segal · Anne-Marie George · Ingrid Yu · Christos Dimitrakakis 🔗 |
-
|
Are Large Language Models Post Hoc Explainers?
(
Poster
)
>
link
Large Language Models (LLMs) are increasingly used as powerful tools for a plethora of natural language processing (NLP) applications. A recent innovation, in-context learning (ICL), enables LLMs to learn new tasks by supplying a few examples in the prompt during inference time, thereby eliminating the need for model fine-tuning. While LLMs have been utilized in several applications, their applicability in explaining the behavior of other models remains relatively unexplored. Despite the growing number of new explanation techniques, many require white-box access to the model and/or are computationally expensive, highlighting a need for next-generation post hoc explainers. In this work, we present the first framework to study the effectiveness of LLMs in explaining other predictive models. More specifically, we propose a novel framework encompassing multiple prompting strategies: i) Perturbation-based ICL, ii) Prediction-based ICL, iii) Instruction-based ICL, and iv) Explanation-based ICL, with varying levels of information about the underlying ML model and the local neighborhood of the test sample. We conduct extensive experiments with real-world benchmark datasets to demonstrate that LLM-generated explanations perform on par with state-of-the-art post hoc explainers using their ability to leverage ICL examples and their internal knowledge in generating model explanations. On average, across four datasets and two ML models, we observe that LLMs identify the most important feature with 72.19% accuracy, opening up new frontiers in explainable artificial intelligence (XAI) to explore LLM-based explanation frameworks. |
Nicholas Kroeger · Dan Ley · Satyapriya Krishna · Chirag Agarwal · Himabindu Lakkaraju 🔗 |
-
|
Rectifying Group Irregularities in Explanations for Distribution Shift
(
Poster
)
>
link
It is well-known that real-world changes constituting distribution shift adversely affect model performance. How to characterize those changes in an interpretable manner is poorly understood. Existing techniques take the form of shift explana- tions that elucidate how samples map from the original distribution toward the shifted one by reducing the disparity between the two distributions. However, these methods can introduce group irregularities, leading to explanations that are less feasible and robust. To address these issues, we propose Group-aware Shift Explanations (GSE), an explanation method that leverages worst-group optimization to rectify group irregularities. We demonstrate that GSE not only maintains group structures, but can improve feasibility and robustness over a variety of domains by up to 20% and 25% respectively. |
Adam Stein · Yinjun Wu · Eric Wong · Mayur Naik 🔗 |
-
|
Explainable Alzheimer’s Disease Progression Prediction using Reinforcement Learning
(
Poster
)
>
link
In this study, we present a novel application of SHAP (SHapley Additive exPlanations) to enhance the interpretability of Reinforcement Learning (RL) models for Alzheimer's Disease (AD) progression prediction. Leveraging RL's predictive capabilities on a subset of the ADNI dataset, we employ SHAP to elucidate the model's decision-making process. Our approach provides detailed insights into the key factors influencing AD progression predictions, offering both global and individual, patient-level interpretability. By bridging the gap between predictive power and transparency, our work empowers clinicians and researchers to gain a deeper understanding of AD progression and facilitates more informed decision-making in AD-related research and patient care. |
Raja Farrukh Ali · Ayesha Farooq · Emmanuel Adeniji · John Woods · Vinny Sun · William Hsu 🔗 |
-
|
A Simple Scoring Function to Fool SHAP: Stealing from the One Above
(
Poster
)
>
link
XAI methods such as SHAP can help discover unfairness in black-box models. If the XAI method reveals a significant impact from a "protected attribute" (e.g., gender, race) on the model output, the model is considered unfair. However, adversarial models can subvert XAI methods' detection. Previous approaches to constructing such an adversarial model focus on creating complex scaffolding around the given input data. We propose a simple rule, that does not require access to the underlying data or data distribution, to adapt any scoring function to fool XAI methods, such as SHAP. Our work calls for more attention to scoring functions besides classifiers in the XAI field, and reveals the limitations of XAI methods for explaining behaviors of scoring functions. |
Jun Yuan · Aritra Dasgupta 🔗 |
-
|
Explaining Longitudinal Clinical Outcomes using Domain-Knowledge driven Intermediate Concepts
(
Poster
)
>
link
The black-box nature of complex deep learning models makes it challenging to explain the rationale behind model predictions to clinicians and healthcare providers. Most of the current explanation methods in healthcare provide explanations through feature importance scores, which identify clinical features that are important for prediction. For high-dimensional clinical data, using individual input features as units of explanations often leads to noisy explanations that are sensitive to input perturbations and less informative for clinical interpretation. In this work, we design a novel deep learning framework that predicts domain-knowledge driven intermediate high-level clinical concepts from input features and uses them as units of explanation. Our framework is self-explaining; relevance scores are generated for each concept to predict and explain in an end-to-end joint training scheme. We perform systematic experiments on a real-world electronic health records dataset to evaluate both the performance and explainability of the predicted clinical concepts. |
Sayantan Kumar · Thomas Kannampallil · Aristeidis Sotiras · Philip Payne 🔗 |
-
|
Visual Topics via Visual Vocabularies
(
Poster
)
>
link
Researchers have long used topic modeling to automatically characterize and summarize text documents without supervision. Can we extract similar structures from collections of images? To do this, we propose visual vocabularies, a method to analyze image datasets by decomposing images into segments, and grouping similar segments into visual "words". These vocabularies of visual "words" enable us to extract visual topics that capture hidden themes distinct from what is captured by classic unsupervised approaches. We evaluate our visual topics using standard topic modeling metrics and confirm the coherency of our visual topics via a human study. |
Shreya Havaldar · Weiqiu You · Lyle Ungar · Eric Wong 🔗 |
-
|
Extracting human interpretable structure-property relationships in chemistry using XAI and large language models
(
Poster
)
>
link
Explainable Artificial Intelligence (XAI) is an emerging field in AI that aims to address the opaque nature of machine learning models. Furthermore, it has been shown that XAI can be used to extract input-output relationships, making them a useful tool in chemistry to understand structure-property relationships. However, one of the main limitations of XAI methods is that they are developed for technically oriented users.We propose the XpertAI framework that integrates XAI methods with large language models (LLMs) accessing scientific literature to generate accessible natural language explanations of raw chemical data automatically. We conducted 5 case studies to evaluate the performance of XpertAI. Our results show that XpertAI combines the strengths of LLMs and XAI tools in generating specific, scientific, and interpretable explanations. |
Geemi Wellawatte · Philippe Schwaller 🔗 |
-
|
Interactive Model Correction with Natural Language
(
Poster
)
>
link
In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on spurious correlations that fail to generalize to new data distributions, such as a bird classifier that relies on the background of an image. Preventing models from latching on to spurious correlations necessarily requires additional information beyond labeled data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that far less supervision suffices if we provide targeted feedback about the misconceptions of models trained on a given dataset. We introduce Clarify, a novel natural language interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns, such as "water background" for a bird classifier. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our empirical results show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 7.3% in two datasets with spurious correlations. Finally, we use Clarify to find and rectify 31 novel spurious correlations in ImageNet, improving minority-split accuracy from 21.1% to 28.7%. |
Yoonho Lee · Michelle Lam · Helena Vasconcelos · Michael Bernstein · Chelsea Finn 🔗 |
-
|
On the Relationship Between Explanation and Prediction: A Causal View
(
Poster
)
>
link
Explainability has become a central requirement for the development, deployment, and adoption of machine learning (ML) models and we are yet to understand what explanation methods can and cannot do. Several factors such as data, model prediction, hyperparameters used in training the model, and random initialization can all influence downstream explanations. While previous work empirically hinted that explanations (E) may have little relationship with the prediction (Y), there is a lack of conclusive study to quantify this relationship. Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we measure the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors (hyperparameters) (inputs to generate saliency-based Es or Ys). We discover that Y's relative direct influence on E follows an odd pattern; the influence is higher in the lowest-performing models than in mid-performing models, and it then decreases in the top-performing models. We believe our work is a promising first step towards providing better guidance for practitioners who can make more informed decisions in utilizing these explanations by knowing what factors are at play and how they relate to their end task. |
Amir-Hossein Karimi · Krikamol Muandet · Simon Kornblith · Bernhard Schölkopf · Been Kim 🔗 |
-
|
ReLax: An Efficient and Scalable Recourse Explanation Benchmarking Library using JAX
(
Poster
)
>
link
Despite the progress made in the field of algorithmic recourse, current research practices remain constrained, largely restricting benchmarking and evaluation of recourse methods to medium-sized datasets (approximately 50k data points) due to the severe runtime overhead of recourse generation. This constraint impedes the pace of research development in algorithmic recourse and raises concerns about the scalability of existing methods. To mitigate this problem, we propose ReLax, a JAX-based benchmarking library, designed for efficient and scalable recourse explanations. ReLax supports a wide range of recourse methods and datasets and offers performance improvements of at least two orders of magnitude over existing libraries. Notably, we demonstrate that ReLax is capable of benchmarking real-world datasets of up to 10M data points, roughly 200 times the scale of current practices, without imposing prohibitive computational costs. ReLax is fully open-sourced and can be accessed at https://github.com/BirkhoffG/jax-relax. |
Hangzhi Guo · Xinchang Xiong · Wenbo Zhang · Amulya Yadav 🔗 |
-
|
Caution to the Exemplars: On the Intriguing Effects of Example Choice on Human Trust in XAI
(
Poster
)
>
link
In model audits explainable AI (XAI) systems are usually presented to human auditors on a limited number of examples due to time constraints. However, recent literature has suggested that in order to establish trust in ML models, it is not only the model’s overall performance that matters but also the specific examples on which it is correct. In this work, we study this hypothesis through a controlled user study with N = 320 participants. On a tabular and an image dataset, we show model explanations to users on examples that are categorized as ambiguous or unambiguous. For ambiguous examples, there is disagreement on the correct label among human raters whereas for unambiguous examples human labelers agree. We find that ambiguity can have a substantial effect on human trust, which is however influenced by surprising interactions of the data modality and explanation quality. While unambiguous examples boost trust for explanations that remain plausible, they also help auditors identify highly implausible explanations, thereby decreasing trust. Our results suggest paying closer attention to the selected examples in the presentation of XAI techniques. |
Tobias Leemann · Yao Rong · Thai-Trang Nguyen · Dr. Enkelejda Kasneci · Gjergji Kasneci 🔗 |
-
|
Policy graphs in action: explaining single- and multi-agent behaviour using predicates
(
Poster
)
>
link
This demo shows that policy graphs (PGs) provide reliable explanations of the behaviour of agents trained in two distinct environments. Additionally, this work shows the ability to generate surrogate agents using PGs that exhibit accurate behavioral resemblances to the original agents and that this feature allows us to validate the explanations given by the system. This facilitates transparent integration of opaque agents into socio-technical systems, ensuring explainability of their actions and decisions, enabling trust in hybrid human-AI environments, and ensuring cooperative efficacy. We present demonstrations based on two environments and we present a work-in-progress library that will allow integration with a broader range of environments and types of agent policies. |
Sergio Alvarez-Napagao · Adrián Tormos · Victor Gimenez-Abalos · Dmitry Gnatyshak 🔗 |
-
|
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection
(
Poster
)
>
link
Network intrusion detection (NID) systems which leverage machine learning have been shown to have strong performance in practice when used to detect malicious network traffic. Decision trees in particular offer a strong balance between performance and simplicity, but require users of NID systems to have background knowledge in machine learning to interpret. In addition, they are unable to provide additional outside information as to why certain features may be important for classification. In this work, we explore the use of large language models (LLMs) to provide explanations and additional background knowledge for decision tree NID systems. Further, we introduce a new human evaluation framework for decision tree explanations, which leverages automatically generated quiz questions that measure human evaluators' understanding of decision tree inference. Finally, we show LLM generated decision tree explanations correlate highly with human ratings of readability, quality, and use of background knowledge while simultaneously providing better understanding of decision boundaries. |
Noah Ziems · Gang Liu · John Flanagan · Meng Jiang 🔗 |
-
|
ObEy Anything: Quantifiable Object-based Explainability without Ground Truth Annotations
(
Poster
)
>
link
With neural networks quickly being adopted in throughout society, understanding their behavior is becoming more important than ever. However, today's explainable AI field mostly consist of methods to explain single decisions of a model, which do not give us insight into the model as a whole, rendering the notion of explainability ambiguous. To this end, we contribute to the discussion of the distinction between explanation methods and explainability, and introduce Object-based Explainability (ObEy), a novel metric to quantify the explainability of models. ObEy is grounded in the natural in natural sciences and scores saliency maps based on visual perception of objects using segmentation masks. However, as such masks are not readily available in practical settings, we propose to use a new foundational model to generate segmentation masks, making our metric applicable in any setting. We demonstrate ObEy's immediate applicability to practical use cases, and present new insights into the explainability of adversarially trained models from a quantitative perspective. |
William Ho · Lennart Schulze · Richard Zemel 🔗 |
-
|
Cost-aware counterfactuals for black box explanations
(
Poster
)
>
link
Counterfactual explanations provide actionable insights into the minimal change in a system that would lead to a more desirable prediction from a black box model. We address the practical challenges of finding counterfactuals in the setting where there is a different cost or preference for perturbing each feature. We propose a multiplicative weight approach that is applied on the perturbation, and show that this simple approach can be easily adapted to obtain multiple diverse counterfactuals, as well as to integrate the importance features obtained by other state of the art explainers to provide counterfactual examples. Additionally, we discuss the computation of valid counterfactuals with numerical gradient-based methods when the black box model presents flat regions with no reliable gradient. In this scenario, sampling approaches, as well as those that rely on available data, sometimes provide counterfactuals that may not be close to the decision boundary. We show that a simple long-range guidance approach, when no gradient is available, improves quality of the counterfactual explanation in this scenario. In this work we discuss existing approaches, and show how our proposed alternatives compares favourably on different datasets and metrics. |
Natalia Martinez · Kanthi Sarpatwar · Sumanta Mukherjee · Roman Vaculin 🔗 |
-
|
The Disagreement Problem in Faithfulness Metrics
(
Poster
)
>
link
The field of explainable artificial intelligence (XAI) aims to explain how black-box machine learning models work. Much of the work centers around the holy grail of providing post-hoc feature attributions to any model architecture. While the pace of innovation around novel methods has slowed down, the question remains how to choose a method, and how to make it fit for purpose. Recently, efforts around benchmarking XAI methods have suggested metrics for that purpose—but there are many choices. That bounty of choice still leaves an end user unclear on how to proceed. This paper focuses on comparing metrics with the aim of measuring faithfulness of local explanations on tabular classification problems—and shows that the current metrics don’t agree; leaving users unsure how to choose the most faithful explanations. |
Brian Barr · Noah Fatsi · Leif Hancox-Li · Peter Richter · Daniel Proano 🔗 |
-
|
Empowering Domain Experts to Detect Social Bias in Generative AI with User-Friendly Interfaces
(
Poster
)
>
link
Generative AI models have become vastly popular and drive advances in all aspects of the modern economy. Detecting and quantifying the implicit social biases that they inherit in training, such as racial and gendered biases, is a critical first step in avoiding discriminatory outcomes. However, current methods are difficult to use and inflexible, presenting an obstacle for domain experts such as social scientists, ethicists, and gender studies experts. We present two comprehensive open-source bias testing tools (BiasTestGPT for PLMs and BiasTestVQA for VQA models) hosted on HuggingFace to address this challenge. With these tools, we provide intuitive and flexible tools for social bias testing in generative AI models, allowing for unprecedented ease in detecting and quantifying social bias across multiple generative AI models and mediums. |
Roy Jiang · Rafal Kocielnik · Adhithya Prakash Saravanan · Pengrui Han · R. Michael Alvarez · Animashree Anandkumar 🔗 |
-
|
Do Concept Bottleneck Models Obey Locality?
(
Poster
)
>
link
Concept-based learning improves a deep learning model's interpretability by explaining its predictions via human-understandable concepts. Deep learning models trained under this paradigm heavily rely on the assumption that neural networks can learn to predict the presence or absence of a given concept independently of other concepts. Recent work, however, strongly suggests that this assumption may fail to hold in Concept Bottleneck Models (CBMs), a quintessential family of concept-based interpretable architectures. In this paper, we investigate whether CBMs correctly capture the degree of conditional independence across concepts when such concepts are localised both \textit{spatially}, by having their values entirely defined by a fixed subset of features, and \textit{semantically}, by having their values correlated with only a fixed subset of predefined concepts. To understand locality, we analyse how changes to features outside of a concept's spatial or semantic locality impact concept predictions. Our results suggest that even in well-defined scenarios where the presence of a concept is localised to a fixed feature subspace, or whose semantics are correlated to a small subset of other concepts, CBMs fail to learn this locality. These results cast doubt upon the quality of concept representations learnt by CBMs and strongly suggest that concept-based explanations may be fragile to changes outside their localities. |
Naveen Raman · Mateo Espinosa Zarlenga · Juyeon Heo · Mateja Jamnik 🔗 |
-
|
Diffusion-Guided Counterfactual Generation for Model Explainability
(
Poster
)
>
link
Generating counterfactual explanations is one of the most effective approaches for uncovering the inner workings of black-box neural network models and building user trust. While remarkable strides have been made in generative modeling using diffusion models in domains like vision, their utility in generating counterfactual explanations in structured modalities remains unexplored. In this paper, we introduce Structured Counterfactual Diffuser or SCD, the first plug-and-play framework leveraging diffusion for generating counterfactual explanations in structured data. SCD learns the underlying data distribution via a diffusion model which is then guided at test time to generate counterfactuals for any arbitrary black-box model, input, and desired prediction. Our experiments show that our counterfactuals not only exhibit high plausibility compared to the existing state-of-the-art but also show significantly better proximity and diversity. |
Nishtha Madaan · Srikanta Bedathur 🔗 |
-
|
GLANCE: Global to Local Architecture-Neutral Concept-based Explanations
(
Poster
)
>
link
Most of the current explainability techniques focus on capturing the importance of features in input space. However, given the complexity of models and data-generating processes, the resulting explanations are far from being complete, in that they lack an indication of feature interactions and visualization of their effect. In this work, we propose a novel surrogate-model-based explainability framework to explain the decisions of any CNN-based image classifiers by extracting causal relations between the features. These causal relations serve as global explanations from which local explanations of different forms can be obtained. Specifically, we employ a generator to visualize the `effect' of interactions among features in latent space and draw feature importance therefrom as local explanations. We demonstrate and evaluate explanations obtained with our framework on the Morpho-MNIST, the FFHQ, and the AFHQ datasets. |
Avinash Kori · Ben Glocker · Francesca Toni 🔗 |