NeurIPS Confronting the Faithfulness Challenge with Post-hoc Model Explanations.

Talk
in
Workshop: XAI in Action: Past, Present, and Future Applications

Confronting the Faithfulness Challenge with Post-hoc Model Explanations.

Julius Adebayo

[ Abstract ]

Abstract:

Explaining the output of a trained deep neural network has emerged as a key research challenge. Several classes of explanation methods (feature attributions, training point ranking, post-hoc concept attribution) have been proposed to address that challenge. However, despite significant research contributions, evidence points to their ineffectiveness. In this talk, I'll highlight a key challenge that undercuts the effectiveness of current post hoc explanations methods: faithfulness. A model's explanation is faithful if the feature importance score, induced by the explanation, indicates the magnitude of the change in the model's output, when that feature is ablated. However, consistent evidence indicates that post hoc explanations of large-scale deep nets, under standard training regimes, are unfaithful. I'll close with two vignettes: the first on emerging recipes for overcoming the faithfulness challenge, and the second on an alternative paradigm that involves developing intrinsically interpretable models.

Chat is not available.

Talk in Workshop: XAI in Action: Past, Present, and Future Applications

Confronting the Faithfulness Challenge with Post-hoc Model Explanations.

Julius Adebayo

Talk
in
Workshop: XAI in Action: Past, Present, and Future Applications