NeurIPS Poster Linguistic Collapse: Neural Collapse in (Large) Language Models

Poster

Linguistic Collapse: Neural Collapse in (Large) Language Models

Robert Wu · Vardan Papyan

East Exhibit Hall A-C #2003

[ Abstract ]

[ Paper] [ OpenReview]

Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Neural collapse (

N C

$\mathcal{NC}$ ) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers.These behaviors -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension.Recent studies have explored

N C

$\mathcal{NC}$ in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries.Language modeling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond to similar contextual embeddings; and large language models (LLMs) in particular are typically only trained for a few epochs.This paper empirically investigates the impact of scaling the architectures and training of causal language models (CLMs) on their progression towards

N C

$\mathcal{NC}$ .We find that

N C

$\mathcal{NC}$ properties that develop with scale (and regularization) are linked to generalization.Moreover, there is evidence of some relationship between

N C

$\mathcal{NC}$ and generalization independent of scale.Our work thereby underscores the generality of

N C

$\mathcal{NC}$ as it extends to the novel and more challenging setting of language modeling.Downstream, we seek to inspire further research on the phenomenon to deepen our understanding of LLMs -- and neural networks at large -- and improve existing architectures based on

N C

$\mathcal{NC}$ -related properties.Our code is hosted on GitHub: [

h p s : / g i t h u b . c o \frac{m}{r} h u \bar{b} w \frac{u}{l} \in g u i s t i c - c o l l a p s e

$https://github.com/rhubarbwu/linguistic-collapse$ ](https://github.com/rhubarbwu/linguistic-collapse).

Chat is not available.