Poster
in
Workshop: CogInterp: Interpreting Cognition in Deep Learning Models

Causal Interventions on Continuous Features in LLMs: A Case Study in Verb Bias

Zhenghao Herbert Zhou · R. Thomas McCoy · Robert Frank

Project Page [ OpenReview]

Abstract

We investigate how large language models (LLMs) encode and causally use continuous, context-dependent properties in syntactic processing, focusing on verb bias and its influence on structural priming. Building on prior work that localized binary morphosyntactic features in non-basis-aligned subspaces, we introduce a simple and efficient method combining principal component analysis with beta regression to identify verb-bias subspaces in function vectors. Function vectors are compact task representations derived from in-context learning sequences. We show that function vectors can be employed to simulate structural priming in LLMs. Our method supports counterfactual continuous manipulation of the verb-bias subspace, and doing so yields the predicted shifts in priming magnitudes, confirming that the subspace is causally involved in syntactic choice. Our method thus extends causal interpretability methods to continuous linguistic variables, and our application of this method supports the proposal that the same mechanism is responsible for in-context learning and structural priming in LLMs.

Chat is not available.