The Binding Problem in Vision Models: Geometric, Functional, and Behavioral Approaches
Huang · Yihao Li · Yingshan CHANG · Saeed Salehi · Konrad Kording
Abstract
Existing studies of neural networks have focused largely on $\textit{compositionality}$—whether individual features can be linearly decoded and reused—while overlooking the equally important issue of $\textit{binding}$, i.e., how features are linked together to form coherent objects. This leaves a gap in understanding whether models truly represent feature conjunctions rather than mere unstructured feature bags. We propose a geometric and functional framework for quantifying binding, introducing a binding score based on principal angles between concept subspaces and validating it with linear or non-linear probes. To complement this, we design a behavioral diagnostic dataset in which pairs of images share identical feature bags but differ in how those features are bound into objects. Together, these frameworks highlight binding as a distinct and measurable dimension of representation, providing tools to diagnose where current vision models succeed—and where they fail—in capturing object structure.
Chat is not available.
Successful Page Load