Timezone: »
Transformer models can use two fundamentally different kinds of information: information stored in weights during training, and information provided ``in-context'' at inference time. In this work, we show that transformers exhibit different inductive biases in how they represent and generalize from the information in these two sources. In particular, we characterize whether they generalize via parsimonious rules (rule-based generalization) or via direct comparison with observed examples (exemplar-based generalization). This is of important practical consequence, as it informs whether to encode information in weights or in context, depending on how we want models to use that information. In transformers trained on controlled stimuli, we find that generalization from weights is more rule-based whereas generalization from context is largely exemplar-based. In contrast, we find that in transformers pre-trained on natural language, in-context learning is significantly rule-based, with larger models showing more rule-basedness. We hypothesise that rule-based generalization from in-context information might be an emergent consequence of large-scale training on language, which has sparse rule-like structure. Using controlled stimuli, we verify that transformers pretrained on data containing sparse rule-like structure exhibit more rule-based generalization.
Author Information
Stephanie Chan (DeepMind)
Ishita Dasgupta (DeepMind)
Junkyung Kim (DeepMind)
Dharshan Kumaran (DeepMind)
Andrew Lampinen (DeepMind)
Felix Hill (Deepmind)
More from the Same Authors
-
2021 : Task-driven Discovery of Perceptual Schemas for Generalization in Reinforcement Learning »
Wilka Carvalho · Andrew Lampinen · Kyriacos Nikiforou · Felix Hill · Murray Shanahan -
2022 : Collaborating with language models for embodied reasoning »
Ishita Dasgupta · Christine Kaeser-Chen · Kenneth Marino · Arun Ahuja · Sheila Babayan · Felix Hill · Rob Fergus -
2022 : Collaborating with language models for embodied reasoning »
Ishita Dasgupta · Christine Kaeser-Chen · Kenneth Marino · Arun Ahuja · Sheila Babayan · Felix Hill · Rob Fergus -
2022 : The World is not Uniformly Distributed; Important Implications for Deep RL »
Stephanie Chan -
2022 : Meaning without reference in large language models »
Steven Piantadosi · Felix Hill -
2022 Spotlight: Lightning Talks 5A-4 »
Yangrui Chen · Zhiyang Chen · Liang Zhang · Hanqing Wang · Jiaqi Han · Shuchen Wu · shaohui peng · Ganqu Cui · Yoav Kolumbus · Noemi Elteto · Xing Hu · Anwen Hu · Wei Liang · Cong Xie · Lifan Yuan · Noam Nisan · Wenbing Huang · Yousong Zhu · Ishita Dasgupta · Luc V Gool · Tingyang Xu · Rui Zhang · Qin Jin · Zhaowen Li · Meng Ma · Bingxiang He · Yangyi Chen · Juncheng Gu · Wenguan Wang · Ke Tang · Yu Rong · Eric Schulz · Fan Yang · Wei Li · Zhiyuan Liu · Jiaming Guo · Yanghua Peng · Haibin Lin · Haixin Wang · Qi Yi · Maosong Sun · Ruizhi Chen · Chuan Wu · Chaoyang Zhao · Yibo Zhu · Liwei Wu · xishan zhang · Zidong Du · Rui Zhao · Jinqiao Wang · Ling Li · Qi Guo · Ming Tang · Yunji Chen -
2022 Spotlight: Learning Structure from the Ground up---Hierarchical Representation Learning by Chunking »
Shuchen Wu · Noemi Elteto · Ishita Dasgupta · Eric Schulz -
2022 Panel: Panel 2B-3: Data Distributional Properties… & What Can Transformers… »
Dimitris Tsipras · Stephanie Chan -
2022 Poster: Using natural language and program abstractions to instill human inductive biases in machines »
Sreejan Kumar · Carlos G. Correa · Ishita Dasgupta · Raja Marjieh · Michael Y Hu · Robert Hawkins · Jonathan D Cohen · nathaniel daw · Karthik Narasimhan · Tom Griffiths -
2022 Poster: Explainability Via Causal Self-Talk »
Nicholas Roy · Junkyung Kim · Neil Rabinowitz -
2022 Poster: Data Distributional Properties Drive Emergent In-Context Learning in Transformers »
Stephanie Chan · Adam Santoro · Andrew Lampinen · Jane Wang · Aaditya Singh · Pierre Richemond · James McClelland · Felix Hill -
2022 Poster: Learning to Navigate Wikipedia by Taking Random Walks »
Manzil Zaheer · Kenneth Marino · Will Grathwohl · John Schultz · Wendy Shang · Sheila Babayan · Arun Ahuja · Ishita Dasgupta · Christine Kaeser-Chen · Rob Fergus -
2022 Poster: Semantic Exploration from Language Abstractions and Pretrained Representations »
Allison Tam · Neil Rabinowitz · Andrew Lampinen · Nicholas Roy · Stephanie Chan · DJ Strouse · Jane Wang · Andrea Banino · Felix Hill -
2022 Poster: Learning Structure from the Ground up---Hierarchical Representation Learning by Chunking »
Shuchen Wu · Noemi Elteto · Ishita Dasgupta · Eric Schulz -
2021 Poster: Tracking Without Re-recognition in Humans and Machines »
Drew Linsley · Girik Malik · Junkyung Kim · Lakshmi Narasimhan Govindarajan · Ennio Mingolla · Thomas Serre -
2021 Poster: Attention over Learned Object Embeddings Enables Complex Visual Reasoning »
David Ding · Felix Hill · Adam Santoro · Malcolm Reynolds · Matt Botvinick -
2021 Poster: Multimodal Few-Shot Learning with Frozen Language Models »
Maria Tsimpoukelli · Jacob L Menick · Serkan Cabi · S. M. Ali Eslami · Oriol Vinyals · Felix Hill -
2021 Poster: Towards mental time travel: a hierarchical memory for reinforcement learning agents »
Andrew Lampinen · Stephanie Chan · Andrea Banino · Felix Hill -
2021 Oral: Attention over Learned Object Embeddings Enables Complex Visual Reasoning »
David Ding · Felix Hill · Adam Santoro · Malcolm Reynolds · Matt Botvinick -
2020 Poster: What shapes feature representations? Exploring datasets, architectures, and training »
Katherine L. Hermann · Andrew Lampinen -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2018 Poster: Neural Arithmetic Logic Units »
Andrew Trask · Felix Hill · Scott Reed · Jack Rae · Chris Dyer · Phil Blunsom -
2017 : Panel Discussion »
Felix Hill · Olivier Pietquin · Jack Gallant · Raymond Mooney · Sanja Fidler · Chen Yu · Devi Parikh -
2017 : Grounded Language Learning in a Simulated 3D World »
Felix Hill