Skip to yearly menu bar Skip to main content


JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Yuandong Tian · Yiping Wang · Zhenyu Zhang · Beidi Chen · Simon Du

Abstract

Chat is not available.