Skip to yearly menu bar Skip to main content


GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values

Farnoosh Javadi · Walid Ahmed · Habib Hajimolahoseini · Foozhan Ataiefard · Mohammad Hassanpour · Saina Asani · Austin Wen · Omar Mohamed Awad · Kangling Liu · Yang Liu

Abstract

Chat is not available.