Skip to yearly menu bar Skip to main content

Workshop: Attributing Model Behavior at Scale (ATTRIB)

Training Dynamics of Contextual N-Grams in Language Models

Lucia Quirke · Lovis Heindrich · Wes Gurnee · Neel Nanda


Prior work has shown the existence of contextual neurons in language models, including a neuron which activates on text that is in German. We show that one role of this neuron is to unlock what we call contextual n-grams: we find late layer neurons which recognize and continue n-grams common in German text, but which only activate if the German neuron is active. We investigate the formation of this circuit throughout training and find that it is an example of what we call a hierarchical feature. Both the n-grams and the context neuron form independently early in training---the German neuron partially through boosting German unigram statistics, and the n-grams by boosting relevant tokens. Only after both features have already been formed do they fit together in the circuit. Contrary to the hypotheses presented in prior work, we find that the circuits of contextual n-grams and of the contextual neuron itself form gradually rather than in a sudden phase transition. We further present a range of anomalous observations such as a simultaneous phase transition in many tasks coinciding with the learning rate warmup, and evidence that many context neurons form simultaneously early in training, with most later unlearned.

Chat is not available.