Skip to yearly menu bar Skip to main content

Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

Can Transformers In-Context Learn Task Mixtures?

Nilesh Tripuraneni · Lyric Doshi · Steve Yadlowsky

Keywords: [ transformers ] [ model selection ] [ in-context learning ] [ Data Mixtures ]


In-context learning (ICL) refers to the ability of Large Language Models (LLMs) to perform new tasks by conditioning on input-output samples without any parameter updates. Previous work has established that, in a controlled setting, transformers can optimally perform ICL for tasks from a single task family, here a single function class, when they are pretrained on example tasks from that family. Using this setting, we probe the relationship between the pretraining data mixtures and downstream ICL performance. In particular, we empirically explore the ability of pretrained transformers to \textit{select a family of tasks} (i.e. amongst distinct function classes) and \textit{perform learning within that task family} (i.e. learn a function within a function class), all in-context. We show, for pretraining task mixtures balanced across task families, the cost of unsupervised downstream ICL task-family selection is near-zero. For task families rarely seen in pretraining, downstream ICL learning curves exhibit complex, task-dependent non-monotonic behavior. We also characterize the benefit of conditional pretraining in this simplified model, showing how task-family instructions can reduce the overhead of in-context task-family selection.

Chat is not available.