Poster
in
Workshop: CogInterp: Interpreting Cognition in Deep Learning Models

Discovering Functionally Sufficient Projections with Functional Component Analysis

Satchel Grant

Project Page [ OpenReview]

Abstract

Many neural interpretability methods attempt to decompose Neural Network (NN) activity into vector directions or features along which variability serves to represent some interpretable aspect of how the NN performs its computations. In correlative analyses, these features can be used to classify what inputs and outputs correlate with changes in the feature; in casual analyses, these features can be used to causally influence computation and behavior. In both cases, it is easy to view these features as satisfying as ways to interpret NN activity. What if each feature, however, is an incomplete part of the story? For any given feature, is it necessary for the NN's computations, or is it only sufficient? In this work, we present a method for isolating Functionally Sufficient Projections (FSPs) in NN latent vectors, and we use a synthetic case study on MultiLayer Perceptrons (MLPs) to find that multiple, mutually orthogonal FSPs can produce the same behavior. We use the results of this work as a cautionary tale about claims of neural necessity.

Chat is not available.