Poster
in
Workshop: CogInterp: Interpreting Cognition in Deep Learning Models

(How) Do LLMs Plan in One Forward Pass?

Michael Hanna · Emmanuel Ameisen

Project Page [ OpenReview]

Abstract

Planning underpins many human linguistic abilities that modern LLMs now possess. However, both the extent to which LLMs plan and what precisely this entails is poorly understood. In this paper, we propose a stringent definition of latent planning, in one forward pass: a LLM engages in latent planning only if it has a representation that is causally implicated in its generation of both a planned-for token or concept and a preceding context that licenses it.We next use circuits to show that some LLMs plan in simple scenarios: they possess features that represent a planned-for word like accountant, and cause them to output an rather than a; ablating such features changes their output. On the more complex task of completing rhyming couplets, we find that models often identify a rhyme ahead of time, but even large models seldom plan far ahead. However, we can elicit some planning that increases with scale when steering models towards planned words in prose. In sum, we offer a framework for measuring planning and mechanistic evidence of how models' planning abilities grow with scale.

Chat is not available.