Skip to yearly menu bar Skip to main content


Poster

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

Hanlin Zhu · Baihe Huang · Shaolun Zhang · Michael Jordan · Jiantao Jiao · Yuandong Tian · Stuart J Russell

West Ballroom A-D #5610
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''$A \to B$'' (e.g., *Tom is the parent of John*), LLM fails to directly conclude ''$B \gets A$'' (e.g., *John is the child of Tom*) during inference even if the two sentences are semantically identical, which is known as the ''reversal curse''. In this paper, we theoretically analyze the reversal curse via the training dynamics of (stochastic) gradient descent for two auto-regressive models: (1) a bilinear model that can be viewed as a simplification of a one-layer transformer; (2) one-layer transformers under certain assumptions. Our analysis reveals that for both models, the reversal curse is a consequence of the (effective) model weights *asymmetry*, i.e., the increase of weights from a token $A$ to token $B$ during training does not necessarily cause the increase of the weights from $B$ to $A$, which is caused by the training dynamics under certain choice of loss function and the optimization space of model parameters. Moreover, our analysis can be naturally applied to other logical reasoning tasks such as chain-of-thought (COT), which provides a new perspective different from previous work that focuses on expressivity. Finally, we conduct experiments to validate our theory on multi-layer transformers under different settings.

Live content is unavailable. Log in and register to view live content