Skip to yearly menu bar Skip to main content

Workshop: XAI in Action: Past, Present, and Future Applications

A Simple Scoring Function to Fool SHAP: Stealing from the One Above

Jun Yuan · Aritra Dasgupta


XAI methods such as SHAP can help discover unfairness in black-box models. If the XAI method reveals a significant impact from a "protected attribute" (e.g., gender, race) on the model output, the model is considered unfair. However, adversarial models can subvert XAI methods' detection. Previous approaches to constructing such an adversarial model focus on creating complex scaffolding around the given input data. We propose a simple rule, that does not require access to the underlying data or data distribution, to adapt any scoring function to fool XAI methods, such as SHAP. Our work calls for more attention to scoring functions besides classifiers in the XAI field, and reveals the limitations of XAI methods for explaining behaviors of scoring functions.

Chat is not available.