From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories
Abstract
Humans are known to acquire and generalize visual concepts through a natural compare–then–compose process. We ask whether this mechanism can provide principled conditions under which machines generalize existing knowledge to unseen categories. In this work, we formalize cognition of the unseen as two complementary mechanisms for deep learning models: comparison, which uncovers latent concepts by capturing cross-category variations among seen classes, and composition, which extrapolates these concepts continuously to unseen classes. Even without parametric assumptions, we establish identifiability guarantees for learning latent concepts and unseen categories via sufficient contrast and independent support separation, denoted as Comparison–Composition Cognition (C³). Guided by these results, we instantiate a structurally constrained generative model mirroring our theoretical assumptions. Our results on simulated data corroborate our theoretical claims and the effectiveness of our proposed methodology. In the setting of visual cognition with unseen labels, aka On-the-fly Category Discovery, our instantiated approach improves state-of-the-art baselines by +3.8% average accuracy across fine-grained benchmarks. We hope that the C³ framework mirrors human cognition to practical guidance for representational compositionality, illuminating how and why machines can generalize to unseen categories.