Skip to yearly menu bar Skip to main content

Workshop: 4th Workshop on Self-Supervised Learning: Theory and Practice

Non-Vacuous Generalization Bounds for Large Language Models

Sanae Lotfi · Marc Finzi · Yilun Kuang · Tim G. J. Rudner · Micah Goldblum · Andrew Wilson


Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we derive a compression bound that is valid for the unbounded log-likelihood loss, and we extend the bound to handle subsampling, accelerating bound computation on massive datasets. To achieve the extreme level of compression required for non-vacuous generalization bounds, we devise SubLoRA, a low-dimensional non-linear parameterization. Using this approach, we find that larger models have better generalization bounds and are more compressible than smaller models.

Chat is not available.