Skip to yearly menu bar Skip to main content


Poster

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

William Brandon · Mayank Mishra · Aniruddha Nrusimha · Rameswar Panda · Jonathan Ragan-Kelley
2024 Poster

Abstract

Video

Chat is not available.