Skip to yearly menu bar Skip to main content


Linear attention is (maybe) all you need (to understand transformer optimization)

Kwangjun Ahn · Xiang Cheng · Minhak Song · Chulhee Yun · Ali Jadbabaie · Suvrit Sra

Abstract

Chat is not available.