Skip to yearly menu bar Skip to main content

Workshop: UniReps: Unifying Representations in Neural Models

Subjective Randomness and In-Context Learning

Eric Bigelow · Ekdeep S Lubana · Robert Dick · Hidenori Tanaka · Tomer Ullman

[ ] [ Project Page ]
presentation: UniReps: Unifying Representations in Neural Models
Fri 15 Dec 6:15 a.m. PST — 3:15 p.m. PST


Large language models (LLMs) exhibit intricate capabilities, often achieving high performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often unclear, with different prompts eliciting different capabilities, especially when used with in-context learning (ICL). We propose a "Cognitive Interpretability" framework that enables us to analyze ICL dynamics to understand latent concepts underlying LLMs' behavioral patterns. This provides a more nuanced understanding than posthoc evaluation benchmarks, but does not require observing model internals as a mechanistic interpretation would require. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of ICL by manipulating properties of context data, such as sequence length. In the latest GPT-3.5+ models, we find emergent abilities to generate pseudo-random numbers and learn basic formal languages, with striking ICL dynamics where model outputs transition sharply from pseudo-random behaviors to deterministic repetition.

Chat is not available.