NeurIPS Hessian Sets: Uncovering Feature Interactions in Image Classification

Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)

Hessian Sets: Uncovering Feature Interactions in Image Classification

Ayushi Mehrotra · Dipkamal Bhusal · Nidhi Rastogi

[ Abstract ]

[ Poster]

Abstract:

Feature attribution methods explain model predictions by computing the contribution of individual features. However, these methods often overlook the impact of feature interactions, which play a crucial role in tasks like image classification. In this work, we introduce Hessian Sets, a technique that leverages the Hessian matrix to detect and attribute pairwise feature interactions in image classifiers. We adapt Integrated Directional Gradients (IDG) to assign importance to these feature interaction sets. By integrating segmentation masks from the Segment Anything Model (SAM), we provide more interpretable and concise explanations. Our initial experiments on the Imagenette dataset demonstrate that our method produces sparse, interpretable feature attributions while effectively capturing important interactions. This is a work in progress, and we present preliminary results to highlight the potential of our approach for improving explainability in image classifiers.

Chat is not available.

Poster in Workshop: Attributing Model Behavior at Scale (ATTRIB)

Hessian Sets: Uncovering Feature Interactions in Image Classification

Ayushi Mehrotra · Dipkamal Bhusal · Nidhi Rastogi

Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)