Skip to yearly menu bar Skip to main content


Transformers generalize differently from information stored in context vs in weights

Stephanie Chan · Ishita Dasgupta · Junkyung Kim · Dharshan Kumaran · Andrew Lampinen · Felix Hill

Abstract

Chat is not available.