Skip to yearly menu bar Skip to main content


GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values

Farnoosh Javadi ⋅ Walid Ahmed ⋅ Habib Hajimolahoseini ⋅ Foozhan Ataiefard ⋅ Mohammad Hassanpour ⋅ Saina Asani ⋅ Austin Wen ⋅ Omar Mohamed Awad ⋅ Kangling Liu ⋅ Yang Liu

Abstract

Chat is not available.