Skip to yearly menu bar Skip to main content


Transformers generalize differently from information stored in context vs in weights

Stephanie Chan ⋅ Ishita Dasgupta ⋅ Junkyung Kim ⋅ Dharshan Kumaran ⋅ Andrew Lampinen ⋅ Felix Hill

Abstract

Chat is not available.