Timezone: »
Poster
Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language
Zhenlin Xu · Marc Niethammer · Colin Raffel
Deep learning models struggle with compositional generalization, i.e. the ability to recognize or generate novel combinations of observed elementary concepts. In hopes of enabling compositional generalization, various unsupervised learning algorithms have been proposed with inductive biases that aim to induce compositional structure in learned representations (e.g. disentangled representation and emergent language learning). In this work, we evaluate these unsupervised learning algorithms in terms of how well they enable \textit{compositional generalization}. Specifically, our evaluation protocol focuses on whether or not it is easy to train a simple model on top of the learned representation that generalizes to new combinations of compositional factors. We systematically study three unsupervised representation learning algorithms - $\beta$-VAE, $\beta$-TCVAE, and emergent language (EL) autoencoders - on two datasets that allow directly testing compositional generalization. We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself. In addition, we find that the previously proposed metrics for evaluating the levels of compositionality are not correlated with actual compositional generalization in our framework. Surprisingly, we find that increasing pressure to produce a disentangled representation (e.g. increasing $\beta$ in the $\beta$-VAE) produces representations with worse generalization, while representations from EL models show strong compositional generalization. Motivated by this observation, we further investigate the advantages of using EL to induce compositional structure in unsupervised representation learning, finding that it shows consistently stronger generalization than disentanglement models, especially when using less unlabeled data for unsupervised learning and fewer labels for downstream tasks. Taken together, our results shed new light onto the compositional generalization behavior of different unsupervised learning algorithms with a new setting to rigorously test this behavior, and suggest the potential benefits of developing EL learning algorithms for more generalizable representations. Our code is publicly available at https://github.com/wildphoton/Compositional-Generalization .
Author Information
Zhenlin Xu (UNC Chapel Hill, Amazon)
Marc Niethammer (UNC Chapel Hill)
Colin Raffel (UNC Chapel Hill and Hugging Face)
More from the Same Authors
-
2021 : The impact of domain shift on the calibration of fine-tuned models »
Jay Mohta · Colin Raffel -
2022 : Models with Conditional Computation Learn Suboptimal Solutions »
Mohammed Muqeeth · Haokun Liu · Colin Raffel -
2023 Poster: Resolving Interference When Merging Models »
Prateek Yadav · Derek Tam · Leshem Choshen · Colin Raffel · Mohit Bansal -
2023 Poster: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Thomas Wolf · Colin Raffel · Sampo Pyysalo -
2023 Poster: Distributed Inference and Fine-tuning of Large Language Models Over The Internet »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2023 Poster: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data »
Alon Albalak · Colin Raffel · William Yang Wang -
2023 Oral: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Thomas Wolf · Colin Raffel · Sampo Pyysalo -
2022 Panel: Panel 2C-6: Compositional Generalization in… & Confident Adaptive Language… »
Tal Schuster · Zhenlin Xu -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 Workshop: Transfer Learning for Natural Language Processing »
Alon Albalak · Colin Raffel · Chunting Zhou · Deepak Ramachandran · Xuezhe Ma · Sebastian Ruder -
2022 Poster: A Combinatorial Perspective on the Optimization of Shallow ReLU Networks »
Michael S Matena · Colin Raffel -
2022 Poster: On Measuring Excess Capacity in Neural Networks »
Florian Graf · Sebastian Zeng · Bastian Rieck · Marc Niethammer · Roland Kwitt -
2022 Poster: Merging Models with Fisher-Weighted Averaging »
Michael S Matena · Colin Raffel -
2022 Poster: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning »
Haokun Liu · Derek Tam · Mohammed Muqeeth · Jay Mohta · Tenghao Huang · Mohit Bansal · Colin Raffel -
2021 Poster: Accurate Point Cloud Registration with Robust Optimal Transport »
Zhengyang Shen · Jean Feydy · Peirong Liu · Ariel H Curiale · Ruben San Jose Estepar · Raul San Jose Estepar · Marc Niethammer -
2021 Poster: Training Neural Networks with Fixed Sparse Masks »
Yi-Lin Sung · Varun Nair · Colin Raffel -
2020 : Responsible publication: NLP case study »
Miles Brundage · Bryan McCann · Colin Raffel · Natalie Schulter · Zeerak Waseem · Rosie Campbell -
2020 Poster: A shooting formulation of deep learning »
François-Xavier Vialard · Roland Kwitt · Susan Wei · Marc Niethammer -
2020 Oral: A shooting formulation of deep learning »
François-Xavier Vialard · Roland Kwitt · Susan Wei · Marc Niethammer -
2019 Poster: Region-specific Diffeomorphic Metric Mapping »
Zhengyang Shen · Francois-Xavier Vialard · Marc Niethammer -
2017 Poster: Deep Learning with Topological Signatures »
Christoph Hofer · Roland Kwitt · Marc Niethammer · Andreas Uhl -
2015 Poster: Statistical Topological Data Analysis - A Kernel Perspective »
Roland Kwitt · Stefan Huber · Marc Niethammer · Weili Lin · Ulrich Bauer