Poster
in
Workshop: CogInterp: Interpreting Cognition in Deep Learning Models

Predicting the Formation of Induction Heads

Tatsuya Aoyama ⋅ Ethan Wilcox ⋅ Nathan Schneider

Project Page [ OpenReview]

Abstract

Arguably, specialized attention heads dubbed induction heads (IHs) underlie the remarkable in-context learning (ICL) capabilities of modern language models (LMs); yet, a precise characterization of their formation remains unclear. In this study, through a series of experiments using both natural and synthetic data, we show that (1) a simple equation combining batch size and context size predicts the point at which IHs form; (2) surface bigram repetition frequency and reliability strongly affect the formation of IHs, and we find a precise Pareto frontier in terms of these two values; and (3) local dependency with high bigram repetition frequency and reliability is sufficient for IH formation, but when the frequency and reliability are low, categoriality and the shape of the marginal distribution matter.

Chat is not available.